How to Migrate from Redshift to Clickhouse in PostgreSQL

How can I migrate data from Amazon Redshift to ClickHouse with minimal downtime?

Move data from Amazon Redshift to ClickHouse quickly, preserving schema and minimizing downtime.

Welcome to the Galaxy, Guardian!

Oops! Something went wrong while submitting the form.

Description

Why migrate from Redshift to ClickHouse?

Choose ClickHouse when you need sub-second analytics on billions of rows, lower storage costs, and real-time inserts. Redshift’s columnar engine is powerful but can lag on streaming workloads and price at scale.

What are the high-level steps?

1) Export Redshift tables to S3.
2) Transform data types & compression.
3) Create matching tables in ClickHouse.
4) Import files with clickhouse-client or clickhouse-copier.
5) Validate row counts & spot-check queries.
6) Cut over ETL and apps.

How do I export data from Redshift?

Use UNLOAD to write each table to gzipped CSV or Parquet in S3. Grant IAM role access and set PARALLEL OFF for predictable file names.

Example UNLOAD for Orders

UNLOAD ('SELECT * FROM Orders') TO 's3://acme-data/exports/orders_' IAM_ROLE 'arn:aws:iam::123456:role/redshift-s3' FORMAT AS PARQUET PARALLEL OFF;

How do I map Redshift types to ClickHouse?

Redshift INTEGER → ClickHouse Int32, BIGINT → Int64, DECIMAL → Decimal(38,scale), VARCHAR → String, TIMESTAMP → DateTime64(6). Cast during import if needed.

What ClickHouse table engine should I pick?

Use MergeTree for large immutable fact tables like OrderItems. Choose ReplacingMergeTree for slowly changing dimensions such as Products. Set ORDER BY keys that match common filters.

How do I import files into ClickHouse?

Copy objects locally with AWS CLI or stream directly:

aws s3 cp s3://acme-data/exports/orders_000 .

cat orders_000 | clickhouse-client \--query="INSERT INTO Orders FORMAT Parquet"

How can I automate many tables?

Generate DDL and copy scripts via Python or Bash. Use system catalogs to loop over tables, then dynamically run UNLOAD, build ClickHouse CREATE TABLE, and stream files.

How do I minimize downtime?

Run an initial full load, then capture deltas with Redshift COPY query IDs or CDC solution like Debezium to Kafka + ClickHouse KafkaEngine. Perform a final sync during maintenance window.

How do I validate the migration?

Compare SELECT COUNT(*) per table, checksum sample columns, and benchmark business queries in Galaxy to ensure answers match and latency improves.

Best practices

Compress Parquet with ZSTD, use wide UInt32 surrogate keys, partition on event date, and monitor insert lag. Keep Redshift running until ClickHouse dashboards are accepted.

Why How to Migrate from Redshift to Clickhouse in PostgreSQL is important

How to Migrate from Redshift to Clickhouse in PostgreSQL Example Usage


-- Validate counts after migration
SELECT
    'Customers' AS table,
    (SELECT COUNT(*) FROM Customers) AS clickhouse_count,
    (SELECT COUNT(*) FROM redshift.public.customers) AS redshift_count;

-- Performance check
/* ClickHouse */
SELECT customer_id, sum(total_amount) AS revenue
FROM Orders
GROUP BY customer_id
ORDER BY revenue DESC
LIMIT 10;

How to Migrate from Redshift to Clickhouse in PostgreSQL Syntax


-- 1. Redshift export
UNLOAD ('SELECT * FROM Customers')
TO 's3://acme-data/exports/customers_'
IAM_ROLE 'arn:aws:iam::123456:role/redshift-s3'
FORMAT AS PARQUET
PARALLEL OFF;

-- 2. ClickHouse DDL
CREATE TABLE Customers (
    id UInt32,
    name String,
    email String,
    created_at DateTime64(3)
) ENGINE = MergeTree
ORDER BY id;

-- 3. Import into ClickHouse
aws s3 cp s3://acme-data/exports/customers_000 ./
clickhouse-client --query "INSERT INTO Customers FORMAT Parquet" < customers_000;

-- Repeat for Orders, Products, OrderItems. Use scripts to iterate.

Common Mistakes

Ignoring sort keys: Migrating without adjusting ClickHouse ORDER BY keys leads to slow queries. Fix by matching keys to your most common filter columns.
Using CSV over Parquet: CSV inflates file size and lacks type info, causing slow inserts. Export as Parquet with compression to speed up loads.

Frequently Asked Questions (FAQs)

Can I keep Redshift as a fallback?

Yes. Run ClickHouse in parallel, replicate new writes, and switch BI tools gradually. Decommission Redshift after confidence builds.

Do I need clickhouse-copier?

For one-off migrations clickhouse-client works. Use clickhouse-copier or clickhouse-distributed clusters for terabyte-scale parallel loads.

How do I handle schema changes during migration?

Lock DDL in Redshift, or version tables with suffixes. For ongoing migrations, apply the same change scripts to ClickHouse immediately after Redshift.