This is the implementation of Rudder HA. Clustered HA is coming soon to the enterprise edition. Current HA solution contains of one data-plane instance, one load balancer and an autoscaling group.
Each backend instance runs a backend server and a transformer server. We start by creating one instance and set up both the servers (backend and transformer) on this instance. A load balancer routes requests to this instance.
If at any point, if this instance is stopped or health checks fail, the second instance is automatically started with the same configuration as the first instance and the load balancer now routes the traffic to the new instance.
Steps for setting up this HA solution in AWS. This sections documents the design of how HA works. Please jump to the next section for automated Terraform script.
ec2 instance and start both backend and transformer services on this instance.
ami image from this
Ec2 instance can be deleted after creating the ami.
launch template using this
ami image which will be used later in auto-scaling group.
target group with the port on which backend is running and add a health check(/health) to it.
autoscaling group with desired and maximum capacity set as 1. Use the
launch template from the earlier step in this autoscaling group. Add the target group created above to it.
alb (application load balancer) and point it to the target group from earlier.
You can use the endpoint url of the load balancer to send your events from SDKs. Or you can use route53 records to add a
cname to this endpoint and use that instead. This setup ensures that there is almost always one instance running.
This is the terraform script to automate this whole process for you on AWS.
ha-tf branch our terraform repo for the script. https://github.com/rudderlabs/rudder-terraform/blob/ha-tf/ha.tf