Warehouse Schemas

Detailed technical description of the tables created for various sources connected to the warehouse

With RudderStack, you need not define a schema for your event data before sending it from your source device. RudderStack automatically does that for you, before dumping the data into the warehouse.

This document covers the structure of this warehouse schema in detail, and the columns created in different tables based on different events.

Schema

The source name (written in snake case, e.g. source_name) is used by RudderStack to create a schema in the warehouse (called dataset in case of BigQuery). The schema name once created by RudderStack cannot be not changed subsequently, even if you edit the source name on the RudderStack dashboard.

Below is a comprehensive list of tables that are created for each RudderStack source which is connected to the warehouse:

Table

Description

<test_source_name>.tracks

Every track call sent to RudderStack is stored here. This table does not include the custom properties sent under the properties key in the event, but has some common properties (listed in the Common Properties section below) such as received_at, anonymous_id, context_device_info, etc.

<test_source_name>.<track_event_name>

All the common properties along with the custom properties for a track call are store in this table. The table name is the event name specified in the track call, for example: Added to Cart

<test_source_name>.identifies

Every identify call sent to RudderStack is stored here. This also includes all the properties passed as traits in the identify call.

<test_source_name>.users

All the unique users are stored in this table. Only the latest properties used to identify a user are stored here, including the latest anonymousId

<test_source_name>.pages

Every page call sent to RudderStack is stored here. This will include all the properties sent in the page event.

<test_source_name>.screens

Every screen call sent to RudderStack is stored here. This will include all the properties sent in the screen event.

All the properties in the event are stored as top level columns in the corresponding table. Nested properties will be prefixed with the parent key. For example, an event with properties such as{product: {name: iPhone, version: 11} will result in columns product_name and product_version).

Common Properties

Below are the list of some common properties set on all the tables mentioned above:

Column Name

Description

anonymous_id

Anonymous ID of the user

context_<prop>

Context properties set in the event

id

Unique message ID of the event, except for the users table.

The field will be the user ID in case of users table.

sent_at

Timestamp set by the RudderStack SDK when the event call was sent

received_at

Timestamp registered by RudderStack at the time of event ingestion

timestamp

Calculated by RudderStack to account for the client clock skew. The formula used: timestamp = receivedAt - (sentAt - originalTimestamp)

original_timestamp

Timestamp registered by RudderStack SDK when call was invoked

The common properties are reserved by RudderStack and in any scenario of conflict, the properties set by the user are automatically discarded.

Accepted Timestamp Formats

A subset of the ISO 8601 timestamp formats are recognized as timestamp. Anything else is not recognized as timestamp. Listed below are the accepted timestamp formats.

  • 2019-09-26

  • 2009-05-19 14:39:22

  • 2019-09-26T06:30:12.984Z

  • 2020-02-11 04:56:55.175116

  • 2019-09-26T06:30:12.984+0530

  • 2019-09-26T06:30:12.984+05:30

Reserved Keywords

There are some limitations when it comes to using reserved words in a schema, table, or column names. If such words are used in event names, traits or properties, they will be prefixed with a _when RudderStack creates tables or columns for them in your schema.

Besides, integers are not allowed at the start of the schema or table name. Hence, such schema, column or table names will be prefixed with a _.

For instance, '25dollarpurchase' will be changed to '_25dollarpurchase'.

More details about the list of reserved keywords can be found in the individual warehouse destination docs.

Data Type Mismatch Handling

A column in a table once recognized and set as a specific data type by RudderStack will not accept values of other data types. We set the value of the column as NULL for the new event

As an example, a Added to Cart track event has a property cart_value has been recognized and set as int as RudderStack based on initial events. If a new Added to Cart event has cart_value property as string, the value of it will be set as NULL in the warehouse.

Contact Us

For any issues or queries that you might come across, please feel free to contact us or start a conversation on our Discord channel. We will be happy to help you.