Skip to main content

Insights Stack

This is the CDK app that builds our insights infrastructure

Context

RavenDB is not the best databaase for ad-hoc querying across datasets, in order to support the reporting requirements of the business we are replicating data from RavenDB to a PostgreSQL database in AWS Aurora.

We are using a single database named insights which will make cross service joins and aggregations possible.

To Setup

  • install AWS CDK Globally npm install -g aws-cdk@latest
  • Ensure you have ~/.aws/credentials set up, or use aws configure to set up, will use default profile by, er default export AWS_PROFILE=<profile-to-use> if you don't want to use the default one

Resources

  • Aurora PostgreSQL cluster with a single instance
  • Security group with port 5432 open & Post 80 from Cloudflare
  • EC2 instance (production only) with Superset installed (open source dashboard / BI tool)

Deployment

There is a GitHub workflow infrastructure-insights which runs on any change to the infrastructure/insights folder. It will deploy the stack to each of the 4 environments.

Because the stack synth command requires access to the VPCs we need to generate each stack CF template individually while impersonating the correct role for the account in question.

Therefore the deployment generates 4 stacks in separate output folders assuming the correct role for each account we deploy to.

If you need to synth and deploy locally you will need to set the correct AWS profile (AWSPROFILE=platform.dev for example) then run one of the following depending on environment

npm run dev-synth
npm run dev-deploy

npm run uat-synth
npm run uat-deploy

npm run sandbox-synth
npm run sandbox-deploy

npm run prod-synth
npm run prod-deploy

Postgres Servers

We run an instance of PostgresSQL Aurora in each environment AWS account and replicate data to each, connection information is in components/configuration/configuration.ts the password for the hectare user is in AWS Secrets manager for each account under the key hctr/insights/aurura/credentials.

Superset EC2

For production only we run an instance of Apache Superset for BI/dashboards. This is deployed as an EC2 instance only to the prod account. The ssh key to access the server is in the Engineering > Platform > PrivateKeys folder in Google Drive.

A new EC2 instance can be built and Superset installed using this script infrastructure/insights/scripts/superset.sh just run each step in order after spinning up a new Ubuntu Linux instance.

We've enabled Google oAuth2 authentication so we can log in with Hectare Google logins.

The EC2 instance has port 80 open, but only to Cloudflare's IP addresses. The app is hosted at https://insights.wearehectare.com we're offloading SSL at Cloudflare and proxying through to EC2.