Google BigQuery
Step-by-step guide to set up Google BigQuery as a destination in RudderStack.
Google BigQuery is an industry-leading fully-managed cloud data warehouse that allows you to store and analyze petabytes of data in no time.
RudderStack lets you add Google BigQuery as a destination where you can send your customer event data from the data source of your choice.
Please check our Warehouse Schemas guide to know how events are mapped to the tables in BigQuery.
Find the open-source transformer code for this destination in our GitHub repo.

Setting up the BigQuery Project

Follow these steps to set up your BigQuery project before adding it as a destination in RudderStack:
    Create a Google Cloud Platform (GCP) project if you do not have an existing one. More details can be found here.
    Enable the BigQuery API if you have an existing project and it is not enabled. More details can be found here.
    Log into your Google BigQuery console. Copy the project ID as shown in the following image:
Project ID (highlighted)
You will require this project ID while configuring BigQuery as a destination in RudderStack.
Ensure that billing is enabled for the project to enable RudderStack to load data into the BigQuery cluster.
    Create a new Google Cloud Storage (GCS) bucket or provide an existing one to store files before loading data into your BigQuery.
To ensure loading data from cloud storage to BigQuery, make sure to co-locate your GCS storage bucket with BigQuery. More information can be found here.

Setting up the Service Account for RudderStack

    Create a new service account under IAM & Admin > Service Accounts
    Add Service Account permissions as specified below:
      Add the Storage Object Creator and Storage Object Viewer roles to the account.
      Add the BigQuery Job User and BigQuery Data Owner roles to the account.
If a dataset with the name (configurable by the setting: Namespace in the RudderStack destination settings) already exists, the role of BigQuery Data Editor would suffice instead of BigQuery Data Owner.
    Create a key for the service account with JSON as the type and store it.
Service Account Permissions
    Create and download the private JSON key which will be required while configuring BigQuery as a destination in RudderStack, as shown:
JSON key required for the RudderStack UI
Make sure that you create the service account in the same project as BigQuery.

Setting Up the Network Access

If necessary, you can whitelist the RudderStack IPs to enable network access to it.
The IPs to be whitelisted are : 3.216.35.97, 34.198.90.241 , 54.147.40.62 , 23.20.96.9, and 18.214.35.254.

Configuring Google BigQuery in RudderStack

To enable sending data to Google BigQuery, you will first need to add it as a destination in RudderStack. Once the destination is configured and enabled, events from RudderStack will start to flow to BigQuery.
To do so, please follow these steps:
    Choose a source to which you would like to add Google BigQuery as a destination.
Please follow our Adding a Source and Destination guide to know how to add a source in RudderStack.
    Once you have set up the source, select Google BigQuery from the list of destinations. Give your destination a name, and then click on Next. You will then see the following Connection Credentials screen:
Google BigQuery Configuration Settings in RudderStack
    Add your Project ID, Location (optional) and the Staging GCS Storage Bucket Name, as specified in the Setting up Google BigQuery Project section.
    Copy the contents of the credentials JSON file you created and stored, in the section above.
That's it! You have successfully added Google BigQuery as a destination in RudderStack.

Schema, Partitioned Tables, and Views

RudderStack uses the source name (written in snake case, e.g. source_name) to create a dataset in BigQuery.
More details about the tables and columns created by RudderStack can be found here.
RudderStack creates ingestion-time partition tables based on the load date, so you can take advantage of it to query a subset of data.
More details about the BigQuery partitioned tables here. Information on RudderStack creates these tables on load can be found here.
In addition to tables, a view (<table_name>_view) is created for every table for de-duplication purposes.
More information on views can be found in this guide.
We recommend that you use the corresponding view (which contains the events from the last 60 days) to avoid duplicate events in your query results. Since BigQuery views are merely logical views and are not cached, you can create a native table from it to save money by avoiding running the query that defines the view every time.
Users can modify the view query to change the time window of the view. The default value is set to 60 days.

FAQs

How are reserved words handled by RudderStack?

There are some limitations when it comes to using reserved words in a schema, table, or column names. If such words are used in event names, traits or properties, they will be prefixed with a _when RudderStack creates tables or columns for them in your schema.
Besides, integers are not allowed at the start of the schema or table name. Hence, such schema, column or table names will be prefixed with a _.
For instance, '25dollarpurchase' will be changed to '_25dollarpurchase'.

When sending data into a data warehouse, how can I change the table where this data is sent?

By default, RudderStack sends the data to the table / dataset based on the source it is connected to. For example, if the source is Google Tag Manager, RudderStack sets the schema name as gtm_*.
You can override this by setting the Namespace in the destination settings as shown:

I'm looking to send data to BigQuery through RudderStack and I'm trying to understand what data is populated in each column. How do I go about this?

You can refer to the RudderStack Warehouse Schema documentation for details on how RudderStack generates the schema in the warehouse.

I am trying to load data into my BigQuery destination and I get this error:

1
backend_1 | {Location: “”; Message: “Cannot read and write in different locations:
2
source: US, destination: us-central1""; Reason: “invalid”}"
Copied!
Make sure that both your BigQuery dataset and the bucket have the same region.

When piping data to a BigQuery destination, I can set the bucket but not a folder within the bucket. Is there a way to put Rudderstack data in a specific bucket folder?

Yes, you can set the desired folder name in the Prefix input field while configuring your BigQuery destination.

Does open-source RudderStack support near-realtime syncing to BigQuery and Event Replay?

The near-realtime BigQuery syncing feature is currently under development and is planned to be released in the coming months. Unfortunately, Event Replay is not a part of open-source RudderStack currently.

What is the current sync frequency for BigQuery?

If you're using open-source RudderStack, the minimum sync frequency is 30 minutes. If you're self-hosting the data plane or using RudderStack Cloud Pro / Enterprise, you can tweak the config to set uploadFreqInS in config.yamlto 0 and determine the best possible value for near real-time sync.
For more information, please refer to this guide.

Do I need to stop the running pipeline to change my sync frequency? Or will the new change be effective even without stopping the pipeline?

To change the sync frequency, you need not stop the pipeline.

When configuring the BigQuery destination, where does Google use the credentials JSON from?

BigQuery uses the credentials JSON from the configuration settings when setting up the destination. For more information, refer to this section.

When configuring the BigQuery destination, should the user permissions be set for the specific dataset or the whole project?

You need to set project-wide user permissions. Otherwise you may encounter permissions issues.

How long are the failed syncs retried before being aborted?

RudderStack retries the failed syncs for up to 3 hours before aborting them. For more information, refer to this FAQ guide.

Contact Us

If you come across any issues while configuring Google BigQuery with RudderStack, feel free to contact us or start a conversation on our Slack channel.
Last modified 3mo ago