The RudderStack Transformation function is a key feature of the RudderStack platform, and can be a powerful tool for the enterprises. It offers two key benefits:
It can be used for spend optimization by reducing the number of messages directed to a platform and leveraging aggregated events.
It can be used for gaining better business insights by altering the message types for facilitating better classification.
The following are some things to keep in mind while coding your transformation functions for RudderStack:
All of your code should go inside a single transformation function
The transformation function receives a collection of event messages which is a JSON structure. The following are few key fields of the structure:
The transformation function iterates through the collection and for each message, generates one or more messages (keeping the structure unchanged) for further processing.
You must not alter the structure of the event messages. There is a possibility that the message might get rejected in case the required fields are missing.
You must maintain the semantic integrity of the message payload.
You should ideally leave the structures like
rl_traits unchanged as they are concerned with the application and are user-specific. Manipulating them may lead to erroneous results. That said, there can be specific requirements for undertaking such manipulation. For example, a business might be interested to find out the distribution of their customers across Android OS variants. In such case, if the
os_version is anything between 5.0 and 5.1.1 – then the developer might want to replace the
os_version with just
The following are some typical examples of transformations that developers might choose to employ:
Suppose that an original message type is
pageview, but maybe such events have been enlisted as
track messages with event category
Page Viewed in the analytics tool for some reason. In such a scenario, the transformation function can be used to generate a 1:1 message for each such message in the collection, with the message type changed to
rl_event set to
Page Viewed and the appropriate event category populated in the message.
If the enterprise generates a large number of events, and the target analytics platform charges by volume - then only a certain number of samples can be sent for the purpose of analytics. In such a case, we can resort to a sampling strategy where we include every
nth message from the original set in the collection to be returned.
Do a user-wide aggregation of
total_payments field from within messages in the batch received having type
track and event as
spin – and generate a single event with the summed up
total_payments per user. Again, on a platform that charges by volume – such aggregation to find metrics such as average revenue per user can be handy.
For each Product included in a
track message for event
Product List Viewed – generate individual messages that might be flowing into, say, Google Analytics with the event action as
detail. This is typically done when the target platform offers a high free volume threshold and at the same time sufficient ability for granular level analysis (like Google Analytics). These are situations where one input message can give rise to more than one messages.