Modern companies generate real-time events across their websites and applications. These events are mainly used by the product and marketing teams to better understand their customers' product interactions. They are captured in a specific format which generally includes the event name, properties, and the associated metadata.
To effectively analyze customer behavior or drive high-value marketing campaigns and personalizations, teams rely on the consistency of the event data formats. Any inconsistency in the events can lower the quality of analytics significantly and requires a lot of time and engineering effort to clean up.
In reality, however, as multiple stakeholders define and implement the event specifications differently, there are always some inconsistencies introduced in the event data. Some of the reasons for these inconsistencies include:
Capitalization / Casing errors : For example, one event sets the product name to lower case, while another sets it to upper case.
Unit errors : For example, one event tracks the revenue in 100 dollar units, while another tracks it in dollars.
RudderStack's Data Governance API gives you the ability to access all your events and their metadata programmatically. This includes vital information related to the event schema, event payload versions, data types, and more.
By leveraging the Data Governance API, the data engineering team can narrow down the specific nature and source of any event data inconsistencies. With these insights, they update the instrumentation or leverage RudderStack Transformations to clean the incoming events.
Here's a video that explains the features of the Data Governance API in detail:
At a high level, you can take the following steps to investigate and troubleshoot any event data inconsistencies:
Get all the event models into RudderStack using
Identify the source of the event models.
Count the event models to determine the number for each event type.
Identify the differences in the schema versions for a single event model using
Once you have identified the inconsistencies, you can implement specific processes and set alerts for your data governance workflows. For instance, you can create alerts for notifying any missing keys in your events, or use RudderStack transformations to validate your event schemas and transform the faulty incoming event payloads.