How can I use the new JSON Data Type with Kafka?
Kafka and the JSON Data Type
With the introduction of the new JSON
data type, ClickHouse is now a good choice of database for doing JSON analytics.
In this guide, we're going to learn how to load JSON messages from Apache Kafka directly into a single JSON
column in ClickHouse.
Setup Kafka
Let's start by running a Kafka broker on our machine. We're also going to map port 9092 to port 9092 on our host operating system so that it's easier to interact with Kafka:
Ingest data into Kafka
Once that's running, we need to ingest some data.
The Wikimedia recent changes feed is a good source of streaming data, so let's ingest that into the wiki_events
topic:
We can check that the data is being ingested by running the following command:
So far, so good.
Ingest data into ClickHouse
Next, we're going to ingest the data into ClickHouse. First, let's enable the JSON type (which is currently experimental), by setting the following property:
Now, we'll create the wiki_queue
table, which uses the Kafka
table engine.
Note that we're using the JSONAsObject
format, which will ensure that incoming messages are made available as a JSON object.
This format can only be parsed into a table that has a single column with the JSON
type.
Next, we'll create the underlying table to store the Wiki data:
Finally, let's create a materialized view to populate the wiki
table:
Querying JSON data in ClickHouse
We can then write queries against the wiki
table.
For example, we could count the number of bots that have committed changes:
Or we could find out the users that make the most changes on en.wikipedia.org
: