RudderStack is an independent, stand-alone system with dependency only on the database (PostgreSQL).
RudderStack's architecture consists of 2 major components, namely the Control Plane, and Data Plane. A broad, high-level view of RudderStack’s architecture is as shown in the diagram below:
Let us look at each of the above major components in a bit more detail:
RudderStack Control Plane: The control plane mainly consists of the UI to configure the source and destination of the event data. The control plane is further divided into 2 major components:
Web App: This is the front-end application that allows you to set up your data routing with RudderStack.
Configuration Backend: The backend gives you the options to configure your event data sources, destinations, and connections.
RudderStack Data Plane: The data plane is the core engine that is responsible for:
Receiving and buffering the event data
Transforming them into the required destination format, and finally
Relaying it to the destination
RudderStack’s data plane is responsible for receiving, transforming, and routing the transformed event data to its destination in the required format. To do so, it receives the event data from sources that include web apps or Android/iOS devices. RudderStack’s backend is written in Go.
A basic, simplified version of the RudderStack Backend architecture is demonstrated in the figure below:
Let us now dive deep into the components of the RudderStack Backend:
The Gateway is primarily responsible for receiving and forwarding the event data for transformation.
It accepts the event requests and sends an acknowledgement back to the source depending on the acceptance (an HTTP 200 response) or rejection of the event data. The event data is rejected in case of the following scenarios:
Invalid write key
Improper Request size
The source event can come from an iOS/Android device or a web application. In case of a successful receipt, the events are then forwarded for transformation.
The gateway also temporarily stores all the received event data into PostgreSQL before sending the acknowledgment of a successful receipt. Once the event is transformed and sent to the destination, the stored data in the database is then deleted by the Processor.
The processor fetches the data from the Gateway and forwards it to the Transformation module. Once the event data is transformed, the Processor forwards it to the Router, so that it can be sent to the required destination.
The Router sends the processed and transformed event data received from the Processor to the desired destinations, such as Google Analytics, Amplitude, and more. There is also a provision of sending data dumps to Amazon S3, or warehouses such as Amazon Redshift.
The Transformation Module takes the event data from the Processor and converts the event data in the required destination format. It then sends this transformed event data back to the Processor, so that it can be forwarded to the Router and eventually the desired destination.
As discussed previously, RudderStack also supports user-specific transformation where event data can be transformed using specific functions such as modifying the events, performing aggregation of the events, sampling, etc.
The following flow explains the working of the backend:
Client SDK sends events to the Gateway.
The Gateway then:
Stores the event data to the database (PostgreSQL).
Sends an HTTP 200 status acknowledging receipt of the data.
The Processor picks the data from the Gateway and forwards the event data to the Transformation module.
The Transformation module sends the transformed data back to the Processor.
Once the event is transformed and sent to the Router, it is deleted from the Gateway store.
The Router then:
Forwards the transformed event data to the desired destinations.
Stores the information in a separate table in the database.
Once the transformed data reaches the destination, the event data from the router database is deleted by the Router.
We saw how the RudderStack data plane plays a crucial role in receiving, storing, and transforming the source events and delivering them reliably to the destination. The backend engine can be customized with a variety of configuration options. Some of these options include backing up events to S3, rejecting malicious requests by defining the maximum size of the event, and more. Although the default configuration works just fine for most of the companies, RudderStack gives you the flexibility to customize it, if required.