ClickHouse

Step-by-step guide to set up ClickHouse as a destination in RudderStack

ClickHouse is an open-source, column-oriented database management system mainly used for online analytical processing (OLAP). It is fast, and allows for real-time analysis of your data. ClickHouse was originally developed by Yandex, the Russian IT company, for the Yandex Metrica web analytics service. ClickHouse is highly scalable, and built for high performance.

You can now integrate RudderStack with ClickHouse seamlessly and store your customer event data into it, from the source of your choice.

Please check our Warehouse Schemas guide to know how events are mapped to the tables in ClickHouse.

Setting ClickHouse Permissions

You to need to set date_time_input_format to best_effort for ClickHouse to parse all the ISO 8601 date and time formats. For example, '2018-06-08T01:02:03.000Z'.

set date_time_input_format = 'best_effort';

We use UInt8 datatype to set Boolean values and map UInt8 to Boolean internally. So when we fetch schema from ClickHouse, we treat UInt8 as Boolean.

If you are creating tables in the same database where RudderStack loads, we highly recommend not using UInt8 as a datatype except for boolean values (0,1).

Configuring ClickHouse in RudderStack

In order to enable dumping data to ClickHouse, you will first need to add it as a destination to the source from which you are sending the event data. Once the destination is enabled, events from RudderStack will automatically start to flow to ClickHouse.

To configure ClickHouse as a destination, please follow these steps:

  • Choose a source to which you would like to add ClickHouse as a destination. You can also simply create a destination and connect it to a source later.

Please follow our Adding a Source and Destination guide to know how to add a source in RudderStack.

  • After choosing a source, select ClickHouse from the list of destinations.

  • Give your destination a name and then click on Next. You should then see the following screen:

ClickHouse Destination Settings on the RudderStack Dashboard
  • Add the required credentials in the Connection Settings as mentioned below:

    • Host - The host name of your ClickHouse Database.

    • Port - The port of your ClickHouse Host

    • Database - The database name in your ClickHouse instance where the data gets loaded.

    • User - The username which has the required read/write access to the above database.

    • Password - The password for the above user.

    • Secure - To establish secure connection.

    • Skip verify - To skip certificate verification. if you are using self-signed certificates, set this field true

    • CA certificate - Certificate which needs to be verified while establishing a secure connection. Add the certificate contents to the text input

    • Bucket Provider - Intermediate storage for storing staging files. Currently, we support S3, MinIO, Google Cloud Storage, and Azure Blob Storage.

FAQs

How do we de-dupe events that are loaded into the warehouse?

We create tables with engine ReplacingMergeTree order by (received_at, id) and column dataType as Nullable(dataType).ReplacingMergeTree replaces the latest event which has the same received_at, id while merging. Nullable is not applicable for sortKeys.

How do we merge user properties in the users table?

For users table, we create table with engine AggregatingMergeTree order by id and column dataType as SimpleAggregateFunction(anyLast, Nullable(dataType)). Merging the columns with the sameidpicks the last value which is not null. Nullable is not applicable for sortKeys.

How does RudderStack handle cases when loading data into ClickHouse?

RudderStack converts the event keys into the lower case before exporting the data into ClickHouse. This is so that it does not create two tables in case the event name has two different cases.

Contact Us

If you come across any issues while configuring PostgreSQL with RudderStack, please feel free to contact us or start a conversation on our Discord channel. We will be happy to help you.