High Availability

Guide to getting high availability with RudderStack - what is Rudder HA and detailed steps on how to set up Rudder HA on AWS

This is the implementation of Rudder HA. Clustered HA is coming soon to the enterprise edition. Current HA solution contains of one data-plane instance, one load balancer and an autoscaling group.

Rudder HA

Each backend instance runs a backend server and a transformer server. We start by creating one instance and set up both the servers (backend and transformer) on this instance. A load balancer routes requests to this instance.

If at any point, if this instance is stopped or health checks fail, the second instance is automatically started with the same configuration as the first instance and the load balancer now routes the traffic to the new instance.

There may be a 2 minute downtime before the new instance is ready to serve requests (Downtime depends on the cloud provider). But SDKs have retry mechanism which should make sure those events sent during the downtime are not lost.

Setting up on AWS

Steps for setting up this HA solution in AWS. This sections documents the design of how HA works. Please jump to the next section for automated Terraform script.

  1. Create an ec2 instance and start both backend and transformer services on this instance.

  2. Create an ami image from this ec2 instance. Ec2 instance can be deleted after creating the ami.

  3. Create a launch template using this ami image which will be used later in auto-scaling group.

  4. Create a target group with the port on which backend is running and add a health check(/health) to it.

  5. Create an autoscaling group with desired and maximum capacity set as 1. Use the launch template from the earlier step in this autoscaling group. Add the target group created above to it.

  6. Create an alb (application load balancer) and point it to the target group from earlier.

You can use the endpoint url of the load balancer to send your events from SDKs. Or you can use route53 records to add a cname to this endpoint and use that instead. This setup ensures that there is almost always one instance running.

AWS Terraform

This is the terraform script to automate this whole process for you on AWS. Refer to ha-tf branch our terraform repo for the script. https://github.com/rudderlabs/rudder-terraform/blob/ha-tf/ha.tf