How do I convert Files from Parquet to CSV or JSON?
Converting Files from Parquet to CSV or JSON
You can use clickhouse-local
to convert files between any of the input and output formats that ClickHouse supports (which is over 70 different formats!). In this article, we are convert a Parquet file in S3 into a CSV and JSON file.
Let's start at the beginning. ClickHouse has a collection of table functions that read from files, databases and other resoures and converts the data to a table. To demonstrate, suppose we have a Parquet file in S3. We will use the s3
table function to read it (ClickHouse knows it's a Parquet file based on the filename).
But first, let's download the clickhouse
binary:
Accessing the data using a table function
Let's verify we can read the file by using DESCRIBE
on the resulting table that the s3
table function creates:
This particular file contains home prices of properties sold in the United Kingdom. The response looks like:
You can run any query you want on the data. For example, let's see which towns have the highest average price of homes:
The response looks like:
Convert the Parquet file to a CSV
You can send the result of any SQL query to a file. Let's grab all the columns from our Parquet file in S3 and send the output to a new CSV file. Because the output file ends in .csv
, ClickHouse knows to use the CSV
output format:
Let's verify it worked:
Convert the Parquet file to a JSON
To convert the Parquet file to JSON, simply change the extension on the output filename:
Let's verify it worked:
Convert CSV to Parquet
It works both ways - we can easily read in the new CSV file and output it into a Parquet file. The local file house_prices.csv
can be read in ClickHouse using the file
table function, and ClickHouse outputs the file in Parquet format based on the filename ending in .parquet
(or we could have added the FORMAT Parquet
clause):
As we mentioned above, you can use any of the ClickHouse input and output formats along with clickhouse local
to easily convert files into different formats.