ClickHouse is an open-source, column-oriented database management system mainly used for online analytical processing (OLAP). It is fast, and allows for real-time analysis of your data. ClickHouse was originally developed by Yandex, the Russian IT company, for the Yandex Metrica web analytics service. ClickHouse is highly scalable, and built for high performance.
You can now integrate RudderStack with ClickHouse seamlessly and store your customer event data into it, from the source of your choice.
You to need to set
best_effort for ClickHouse to parse all the ISO 8601 date and time formats. For example,
set date_time_input_format = 'best_effort';
You will need to whitelist the RudderStack IPs to enable network access to it.
In order to enable dumping data to ClickHouse, you will first need to add it as a destination to the source from which you are sending the event data. Once the destination is enabled, events from RudderStack will automatically start to flow to ClickHouse.
To configure ClickHouse as a destination, please follow these steps:
Choose a source to which you would like to add ClickHouse as a destination. You can also simply create a destination and connect it to a source later.
After choosing a source, select ClickHouse from the list of destinations.
Give your destination a name and then click on Next. You should then see the following screen:
Add the required credentials in the Connection Settings as mentioned below:
Host - The host name of your ClickHouse Database.
Port - The TCP port of your ClickHouse Host. ex: 9000. if you want the connection to be secure use TCP secure port 9440. reference
Database - The database name in your ClickHouse instance where the data gets loaded.
Cluster - The name of your ClickHouse cluster Please note: Leave this blank if you are running a single host ClickHouse Cluster
User - The username which has the required read/write access to the above database.
Password - The password for the above user.
Secure - To establish secure connection.
Skip verify - To skip certificate verification. if you are using self-signed certificates, set this field
CA certificate - Certificate which needs to be verified while establishing a secure connection. Add the certificate contents to the text input
Bucket Provider - Intermediate storage for storing staging files. Currently, we support S3, MinIO, Google Cloud Storage, and Azure Blob Storage.
We create tables with engine
ReplacingMergeTree order by (received_at, id) and column
ReplacingMergeTree replaces the latest event which has the same
received_at, id while merging.
Nullable is not applicable for
For the user's table, we create a table with an engine
AggregatingMergeTree order by
id and column
SimpleAggregateFunction(anyLast, Nullable(dataType)). Merging the columns with the same
idpicks the last value which is not null.
Nullable is not applicable for
RudderStack converts the event keys into the lower case before exporting the data into ClickHouse. This is so that it does not create two tables in case the event name has two different cases.