Move data from Amazon Redshift to ClickHouse quickly, preserving schema and minimizing downtime.
Choose ClickHouse when you need sub-second analytics on billions of rows, lower storage costs, and real-time inserts. Redshift’s columnar engine is powerful but can lag on streaming workloads and price at scale.
1) Export Redshift tables to S3.
2) Transform data types & compression.
3) Create matching tables in ClickHouse.
4) Import files with clickhouse-client
or clickhouse-copier
.
5) Validate row counts & spot-check queries.
6) Cut over ETL and apps.
Use UNLOAD
to write each table to gzipped CSV or Parquet in S3. Grant IAM role access and set PARALLEL OFF
for predictable file names.
UNLOAD ('SELECT * FROM Orders')
TO 's3://acme-data/exports/orders_'
IAM_ROLE 'arn:aws:iam::123456:role/redshift-s3'
FORMAT AS PARQUET
PARALLEL OFF;
Redshift INTEGER
→ ClickHouse Int32
, BIGINT
→ Int64
, DECIMAL
→ Decimal(38,scale)
, VARCHAR
→ String
, TIMESTAMP
→ DateTime64(6)
. Cast during import if needed.
Use MergeTree
for large immutable fact tables like OrderItems
. Choose ReplacingMergeTree
for slowly changing dimensions such as Products
. Set ORDER BY
keys that match common filters.
Copy objects locally with AWS CLI or stream directly:
aws s3 cp s3://acme-data/exports/orders_000 .
cat orders_000 | clickhouse-client \--query="INSERT INTO Orders FORMAT Parquet"
Generate DDL and copy scripts via Python or Bash. Use system catalogs to loop over tables, then dynamically run UNLOAD
, build ClickHouse CREATE TABLE
, and stream files.
Run an initial full load, then capture deltas with Redshift COPY
query IDs or CDC solution like Debezium to Kafka + ClickHouse KafkaEngine
. Perform a final sync during maintenance window.
Compare SELECT COUNT(*)
per table, checksum sample columns, and benchmark business queries in Galaxy to ensure answers match and latency improves.
Compress Parquet with ZSTD, use wide UInt32
surrogate keys, partition on event date, and monitor insert lag. Keep Redshift running until ClickHouse dashboards are accepted.
Yes. Run ClickHouse in parallel, replicate new writes, and switch BI tools gradually. Decommission Redshift after confidence builds.
For one-off migrations clickhouse-client
works. Use clickhouse-copier
or clickhouse-distributed
clusters for terabyte-scale parallel loads.
Lock DDL in Redshift, or version tables with suffixes. For ongoing migrations, apply the same change scripts to ClickHouse immediately after Redshift.