Warehouse Schemas
Detailed technical description of the various tables created when sending events to the data warehouse.
When sending your events to a data warehouse via RudderStack, you don't need to define a schema for the event data before sending it from your source. Instead, RudderStack automatically does that for you by following a predefined warehouse schema.
This guide details the structure of this warehouse schema and the columns created in different tables based on different events.

Schema

RudderStack uses the source name (written in snake case, for example, source_name) to create a schema in your data warehouse (also called a dataset, in the case of Google BigQuery).
The following is a list of tables created for each RudderStack source that is connected to the warehouse:
Table Name
Description
<test_source_name>.tracks
Every track call sent to RudderStack is stored in this table. It does not include the custom properties sent under the properties key in the event but has some standard properties (listed in the Standard Properties section below) such as received_at, anonymous_id, context_device_info, etc.
<test_source_name>.<track_event_name>
All the standard properties, along with the custom properties for a track call, are stored in this table. The table name is the event name specified in the track call. For example: Added to Cart
<test_source_name>.identifies
Every identify call sent to RudderStack is stored in this table.
It also includes all the properties passed as traits in the identify call.
<test_source_name>.users
RudderStack stores all the unique users in this table. Only the latest properties used to identify a user are stored in it, including the latest anonymousId.
<test_source_name>.pages
Every page call sent to RudderStack is stored in this table. This includes all the properties sent in the page event.
<test_source_name>.screens
Every screen call sent to RudderStack is stored in this table. This includes all the properties sent in the screen event.
<test_source_name>.groups
Every group call sent to RudderStack is stored in this table. This includes all the properties sent in the group event.
<test_source_name>.aliases
Every alias call sent to RudderStack is stored in this table. This includes all the properties sent in the alias event.
All the properties in the event are stored as top-level columns in the corresponding table. Nested properties will be prefixed with the parent key. For example, an event with properties as shown: { product: { name: iPhone, version: 11 }
will results in the columns product_name and product_version.

Standard RudderStack Properties

RudderStack sets the following standard properties on all the above mentioned tables:
Column Name
Description
anonymous_id
The anonymous ID of the user.
context_<prop>
Context properties set in the event.
id
The unique message ID of the event, except for the users table.
The field will be the user ID in case of users table.
sent_at
Timestamp set by the RudderStack SDK when the event was sent from the client to RudderStack.
received_at
Timestamp registered by RudderStack when the event was ingested (received).
original_timestamp
Timestamp registered by the RudderStack SDK when the event call was invoked (event was emitted in the SDK).
timestamp
This is calculated by RudderStack to account for the client clock skew. The formula used is: timestamp = receivedAt - (sentAt - originalTimestamp).
event_text
Name of the event mapped from the event key in the payload for the track events.
event
The name of the event table for the track calls.
RudderStack reserves the above-mentioned standard properties. In the case of any conflict, RudderStack automatically discards the properties set by the user.

Clock Skew Considerations

RudderStack considers the time at its end to be absolute and assumes any difference on the client-side. Thus, the client clock skew is relative.
sentAt > original_timestamp is always true. However, timestamp can be more or less than the original_timestamp. Refer to the examples below for more details.
Here's an example of original_timestamp < received_at(The client-side time is less than the server-side time):
original_timestamp
sent_at
received_at
timestamp= received_at -
(sent_at - original_timestamp)
2020-04-26 07:00:43.400
2020-04-26 07:00:45.124
2020-04-26 07:00:45.558
2020-04-26 07:00:44.834
In this case, timestamp will be greater than original_timestamp.
Here's an example of original_timestamp > received_at (The client-side time is more than the server-side time):
original_timestamp
sent_at
received_at
timestamp= received_at -
(sent_at - original_timestamp)
2020-04-26 07:00:45.558
2020-04-26 07:00:46.124
2020-04-26 07:00:43.400
2020-04-26 07:00:43.9650
In this case, timestamp will be less than original_timestamp.

Table Schemas

This section covers the major table schemas for different types of events.

Track

RudderStack creates a record in both the tracks and <event_name> tables for every track call.
A sample Add to Cart event made from the JavaScript SDK is as shown below:
1
// track call using JavaScript SDK
2
rudderanalytics.track(
3
"Add to Cart",
4
{
5
price: 5,
6
currency: "USD",
7
product_id: "P12345",
8
product_name: "N95 Mask"
9
},
10
{
11
context: {
12
ip: "0.0.0.0"
13
},
14
anonymousId: "59b703e3-467a-4a1d-9fe6-da27ed319619",
15
}
16
);
Copied!
The corresponding schemas created for the tracks and add_to_cart tables is as shown:

Table: tracks

Column
Type
Value
Description
id
String
e.g.4d5a7681-e596-40ea-a81c-bf69f9b297f1
Unique messageId generated by RudderStack.
anonymous_id
String
e.g.59b703e3-467a-4a1d-9fe6-da27ed319619
The anonymous ID of the user.
received_at
Timestamp
e.g.2020-04-26 07:00:45.558
Timestamp registered by RudderStack when the event was ingested (received).
sent_at
Timestamp
e.g.2020-04-26 07:00:45.124
Timestamp set by the SDK when the event was sent from the client to RudderStack.
original_timestamp
Timestamp
e.g.2020-04-26 07:00:43.400
Timestamp registered by the SDK when the event call was invoked (event was emitted in the SDK).
timestamp
Timestamp
e.g.2020-04-26 07:00:44.834
Calculated by RudderStack to account for the client clock skew. The formula used is: timestamp = receivedAt - (sentAt - originalTimestamp).
context_ip
String
0.0.0.0
-
context_<prop>
String, Int
e.g.context_app_version: 1.2.3, context_screen_density: 2
-
event
String
add_to_cart
The name of the corresponding event table.
event_text
String
Add to Cart
The name of the event.
uuid_ts
Timestamp
e.g.2020-04-26 07:31:54:735
Added by RudderStack for debugging purposes. Can be ignored for analytics.

Table: add_to_cart

Column
Type
Value
Note
id
String
e.g. 4d5a7681-e596-40ea-a81c-bf69f9b297f1
Unique messageIdgenerated by RudderStack
anonymous_id
String
e.g. 59b703e3-467a-4a1d-9fe6-da27ed319619
-
received_at
Timestamp
e.g. 2020-04-26 07:00:45.558
-
sent_at
Timestamp
e.g. 2020-04-26 07:00:45.124
-
original_timestamp
Timestamp
e.g.2020-04-26 07:00:43.400
-
timestamp
Timestamp
e.g. 2020-04-26 07:00:44.834
-
context_ip
String
0.0.0.0
-
context_<prop>
String, Int
e.g. context_app_version: 1.2.3, context_screen_density: 2
-
event
String
add_to_cart
The name of the event table
event_text
String
Add to Cart
The name of the event
price
Int
5
-
currency
String
USD
-
product_id
String
P12345
-
product_name
String
N95 Mask
-
uuid_ts
Timestamp
e.g. 2020-04-26 07:31:54:735
Added by RudderStack for debugging purposes. Can be ignored for analytics purposes
The event table add_to_cart has all the columns as the tracks table. It also has the properties set by the user inside the key properties.

Identify

RudderStack creates a record in the identifies table and upserts the records in the users table for every identify call, based on the userId.
In case of Google BigQuery, you can use the views created over the tables to query for unique users in the dataset. Refer to the BigQuery documentation for more details.
A sample identify event made from the JavaScript SDK is as shown below:
1
rudderanalytics.identify(
2
"U-12345",
3
{
4
5
first_name: "John",
6
last_name: "Doe",
7
age: 35
8
},
9
{
10
context: {
11
ip: "0.0.0.0"
12
},
13
anonymousId: "59b703e3-467a-4a1d-9fe6-da27ed319619"
14
}
15
);
Copied!
The corresponding schemas created for the identifies and users tables is as shown:

Table: identifies

Column
Type
Value
Note
id
String
e.g. 4d5a7681-e596-40ea-a81c-bf69f9b297f1
Unique messageId generated by RudderStack.
user_id
String
U-12345
The userId in the identify call.
anonymous_id
String
59b703e3-467a-4a1d-9fe6-da27ed319619
-
received_at
Timestamp
e.g. 2020-04-26 07:00:45.558
-
sent_at
Timestamp
e.g. 2020-04-26 07:00:45.124
-
original_timestamp
Timestamp
e.g. 2020-04-26 07:00:43.400
-
timestamp
Timestamp
e.g. 2020-04-26 07:00:44.834
-
context_ip
String
0.0.0.0
-
context_<prop>
String, Int
e.g. context_app_version: 1.2.3, context_screen_density: 2
-
email
String
-
first_name
String
John
-
last_name
String
Doe
-
age
Int
35
-
uuid_ts
Timestamp
e.g. 2020-04-26 07:31:54:735
Added by RudderStack for debugging purposes. Can be ignored for analytics.

Table: users

Column
Type
Value
Note
id
String
U-12345
The unique user ID.
received_at
Timestamp
e.g. 2020-04-26 07:00:45.558
-
context_ip
String
0.0.0.0
-
context_<prop>
e.g. String, Int
e.g. context_app_version: 1.2.3, context_screen_density: 2
-
email
String
-
first_name
String
John
-
last_name
String
Doe
-
age
Int
35
-
uuid_ts
Timestamp
e.g. 2020-04-26 07:31:54:735
Added by RudderStack for debugging purposes. Can be ignored for analytics purposes
The users table has the properties from the latest identify call made for an user. It only has the id column (same as user_id in identifies table) and does not have the anonymous_id column.
To get at a user’s anonymous_id, you can query the identifies table by grouping on the user_id column.

Page & Screen

RudderStack creates a record in the pages or screen table for every page/screen call.
A sample page event is as shown below:
1
rudderanalytics.page(
2
"Cart",
3
"Cart Viewed",
4
{
5
path: "/cart",
6
title: "Shopping Cart",
7
url: "https://rudderstack.com"
8
},
9
{
10
context: {
11
ip: "0.0.0.0"
12
},
13
anonymousId: "59b703e3-467a-4a1d-9fe6-da27ed319619"
14
}
15
);
Copied!
The corresponding schema created for the pages table is as shown:

Table: pages/screens

Column
Type
Value
Note
id
String
e.g. 4d5a7681-e596-40ea-a81c-bf69f9b297f1
-
anonymous_id
String
59b703e3-467a-4a1d-9fe6-da27ed319619
-
received_at
Timestamp
e.g. 2020-04-26 07:00:45.558
-
sent_at
Timestamp
e.g. 2020-04-26 07:00:45.124
-
original_timestamp
Timestamp
e.g. 2020-04-26 07:00:43.400
-
timestamp
Timestamp
e.g. 2020-04-26 07:00:44.834
-
context_ip
String
0.0.0.0
-
context_<prop>
e.g. String, Int
e.g. context_app_version: 1.2.3, context_screen_density: 2
-
name
String
Cart Viewed
The page name.
category
String
Cart
The page category.
path
String
/cart
-
title
String
Shopping Cart
-
url
String
-
uuid_ts
Timestamp
eg. 2020-04-26 07:31:54:735
Added by RudderStack for debugging purposes. Can be ignored for analytics.

Group

RudderStack creates a record in the groups table for every group call.
A sample group call is as shown below:
1
rudderanalytics.group(
2
"DevOps",
3
{
4
5
first_name: "John",
6
last_name: "Doe",
7
age: 35
8
},
9
{
10
context: {
11
ip: "0.0.0.0"
12
},
13
anonymousId: "59b703e3-467a-4a1d-9fe6-da27ed319619"
14
}
15
);
Copied!
The corresponding schemas created for the groups table is as shown:

Table: groups

Column
Type
Value
Note
id
String
e.g. 4d5a7681-e596-40ea-a81c-bf69f9b297f1
The group ID associated with the current user.
anonymous_id
String
59b703e3-467a-4a1d-9fe6-da27ed319619
-
group_id
String
DevOps
-
received_at
Timestamp
e.g. 2020-04-26 07:00:45.558
-
sent_at
Timestamp
e.g. 2020-04-26 07:00:45.124
-
original_timestamp
Timestamp
e.g. 2020-04-26 07:00:43.400
-
timestamp
Timestamp
e.g. 2020-04-26 07:00:44.834
-
context_ip
String
0.0.0.0
-
context_<prop>
e.g. String, Int
e.g. context_app_version: 1.2.3, context_screen_density: 2
-
uuid_ts
Timestamp
e.g. 2020-04-26 07:31:54:735
Added by RudderStack for debugging purposes. Can be ignored for analytics.

Accepted Timestamp Formats

RudderStack only recognizes a subset of the ISO 8601 timestamp formats as a timestamp. Listed below are the accepted timestamp formats:
    2019-09-26
    2009-05-19 14:39:22
    2019-09-26T06:30:12.984Z
    2020-02-11 04:56:55.175116
    2019-09-26T06:30:12.984+0530
    2019-09-26T06:30:12.984+05:30
RudderStack does not recognize any other timestamp format apart from the formats mentioned above.

Reserved Keywords

Note that there are some limitations when it comes to using the reserved words in a schema, table, or column names. If these words are used in the event names, traits or properties, RudderStack automatically prefixes a _ when creating tables or columns for them in your schema.
Also, note that integers are not allowed at the start of the schema or table name. Such schema, column, or table names will be prefixed with a _. For instance, 25dollarpurchase will be changed to _25dollarpurchase.
Find more details about the list of reserved keywords in the following warehouse-specific docs.
Warehouse
Reference
Amazon Redshift
Link
Google BigQuery
Link
Snowflake
Link

How RudderStack Handles Data Type Mismatch

Once RudderStack recognizes and sets a data type for a table column, it will not accept any values for the column that cannot be cast to the specified data type.
The values which cannot be cast are set as NULL in the table and stored in the rudder_discards table.
The schema of the rudder_discards table is as shown below:
Column
Note
row_id
Unique Identifier (ID) associated with each record in the table.
This corresponds to the unique messageId for all the tables except for users table, where it is userId.
table_name
The table name where the full record is inserted, like tracks, add_to_cart, identifies , etc.
column_name
The column name corresponding to the property to be added.
column_value
The column value which caused the data type mismatch and is discarded.
The row_id is the column which users can use to join with original table and update it as required. It is set to messageId for all tables except the users table, where it corresponds to userId.
Shown below is a sample event whose properties are discarded due to a mismatch in the data type of the previous events:
1
// intial track call using the RudderStack JavaScript SDK
2
rudderanalytics.track(
3
"Add to Cart",
4
{
5
price: 5, // originally a int value
6
currency: "USD",
7
product_id: "P12345",
8
product_name: "N95 Mask",
9
added_at: "2020-05-19 14:39:22" // originally a datetime value
10
},
11
{
12
context: {
13
ip: "0.0.0.0"
14
},
15
anonymousId: "59b703e3-467a-4a1d-9fe6-da27ed319619",
16
}
17
);
18
19
20
// subsequent track call using the RudderStack JavaScript SDK
21
rudderanalytics.track(
22
"Add to Cart",
23
{
24
price: "NA", // sent as a string in latest event
25
currency: "USD",
26
product_id: 789, // sent as int but can be casted into original string data type
27
product_name: "N95 Mask",
28
added_at: "05/25/2020" // sent as invalid datetime value
29
},
30
{
31
context: {
32
ip: "0.0.0.0"
33
},
34
anonymousId: "59b703e3-467a-4a1d-9fe6-da27ed319619",
35
}
36
);
Copied!
The subsequent records created in the rudder_discards table for the discarded properties from the second event listed in the example above are as shown:
Row ID
Table Name
Column Name
Column Value
a21620be-6502-44d6-941d-78209a386d58
add_to_cart
price
NA
1e42b2b3-8b6a-49da-8502-83a8db334375
add_to_cart
added_at
05/25/2020

FAQs

Can I change the namespace (schema name) of my data warehouse in RudderStack?

Yes, you can. Although the default namespace will be the source name with some modifications to it, RudderStack also provides an option in the dashboard to explicitly set the namespace of the dataset in the warehouse.
Refer to the warehouse-specific destination settings for configuring the namespace in the RudderStack dashboard.

Contact Us

For any queries on any of the sections covered in this guide, feel free to contact us or start a conversation on our Slack channel.
Last modified 1mo ago