Move tables, schema, and data from Google BigQuery to ClickHouse with minimal downtime.
Teams switch to ClickHouse to cut query latency and cost for high-volume analytics. Migrating keeps historical data accessible in the new engine.
Use the EXPORT DATA
statement to write each source table into compressed Parquet files stored in Google Cloud Storage (GCS). Parquet preserves column types and is natively understood by ClickHouse.
EXPORT DATA OPTIONS( uri='gs://bq_exports/customers_*.parquet', format='PARQUET', compression='SNAPPY') AS SELECT * FROM `project.dataset.Customers`;
Connect to ClickHouse and run CREATE TABLE
with column types that best match BigQuery. Use a MergeTree engine for OLAP workloads.
CREATE TABLE Customers ( id UInt64, name String, email String, created_at DateTime ) ENGINE = MergeTree ORDER BY id;
Run INSERT INTO ... SELECT *
with the input('Parquet')
function or use clickhouse-client --query
.
clickhouse-client --query "INSERT INTO Customers SELECT * FROM s3('https://storage.googleapis.com/bq_exports/customers_*.parquet', 'GCS_ACCESS', 'GCS_SECRET', 'Parquet')";
Yes. Wrap the SELECT
in your INSERT
to cast types, split JSON, or aggregate rows while streaming the Parquet input.
Schedule incremental BigQuery exports based on created_at
or updated_at
and append them to ClickHouse until your application flips to the new database.
Split exports by date range, load in parallel, and set max_partitions_per_insert_block
to reduce the number of small parts created.
Mismatch in numeric precision? Cast to Decimal(18,2)
. UTF-8 issues? Force String
columns and clean later.
No. The fastest path is exporting to storage (GCS or S3 compatible) and loading via ClickHouse’s s3
or url
table functions.
Roughly 150-200 GB/hour per ClickHouse node when loading compressed Parquet. Parallelize loads to maximize throughput.
Yes. Use BigQuery scheduled queries that append only the last hour/day of data to GCS and a ClickHouse cron job to load them.