Move data, schema, and workloads from a local PostgreSQL server into a fully-managed Amazon Redshift cluster with minimal downtime.
Redshift separates storage and compute, auto-scales, and offloads maintenance. Teams get columnar storage, massively parallel processing (MPP), and painless backups without buying new hardware.
1) Assess schema and code, 2) Spin up a Redshift cluster, 3) Create S3 staging buckets, 4) Extract data from PostgreSQL, 5) Load into Redshift, 6) Validate & switch workloads.
Run AWS Schema Conversion Tool (SCT) to scan objects, flag incompatible types, and generate a migration assessment report. Fix issues in PostgreSQL before export to reduce surprises.
UNLOAD to S3 via psql is fastest because Redshift’s COPY reads in parallel. For huge tables add PARALLEL OFF
to create multiple part files automatically.
Let SCT generate CREATE TABLE
scripts or hand-write DDL with DISTKEY and SORTKEY tuned for query patterns. Always choose ENCODE AUTO
for compression.
\! pg_dump \
--format=custom \
--file=customers.dump \
--table=public.customers \
--column-inserts
Or use UNLOAD
inside PostgreSQL 14+:
UNLOAD ($$SELECT * FROM public.customers$$)
TO '/var/lib/postgresql/unload/customers_'
WITH (FORMAT 'CSV', HEADER true);
aws s3 cp /var/lib/postgresql/unload/ s3://ecom-staging/customers/ --recursive
Use aws s3 sync
for delta uploads during cut-over.
COPY customers
FROM 's3://ecom-staging/customers/'
CREDENTIALS 'aws_iam_role=arn:aws:iam::123456789012:role/RedshiftCopyRole'
FORMAT AS CSV
IGNOREHEADER 1
TIMEFORMAT 'auto'
REGION 'us-east-1';
Load each table in parallel to maximize cluster throughput.
AWS Database Migration Service (DMS) can stream CDC from PostgreSQL to Redshift. Enable logical replication in postgresql.conf
, create a replication slot, and let DMS apply ongoing changes until cut-over.
Run row counts, checksums, and spot-check business queries. For example:
SELECT 'customers' AS table,
COUNT(*) AS pg_count,
(SELECT COUNT(*) FROM redshift_public.customers) AS rs_count
FROM public.customers;
After CDC lag is near zero, point BI tools and ETL pipelines to Redshift endpoints. Keep PostgreSQL read-only for a day as a rollback plan.
• Compress export files gzip
to cut network time.
• Size Redshift WLM queues before heavy loads.
• Keep S3 buckets in the same region as the cluster.
• Tag all AWS resources for cost tracking.
1) Copying without compression balloons S3 costs. Always compress exports.
2) Ignoring DISTKEY/SORTKEY yields slow joins. Tune keys before loading big tables.
Query stl_load_errors
for row-level messages. Fix data types, escape characters, and retry with ACCEPTINVCHARS
only as a last resort.
With gzip + multi-part UNLOAD and Redshift RA3, expect 1-2 hours for extract and 20-40 minutes for COPY, assuming 10 Gbps network.
Yes. Keep PostgreSQL online in read-only mode until Redshift is fully validated. Switch DNS only after confirming query results.
Not necessarily. Use AWS DMS with CDC to stream ongoing changes while users stay on PostgreSQL, then cut over once lag is minimal.
Redshift supports PL/pgSQL but with limitations. Use SCT to convert code and manually rewrite unsupported features like triggers or LISTEN/NOTIFY.
Create an IAM role with least privilege for COPY/UNLOAD, enable SSL on the Redshift cluster, and encrypt S3 buckets with SSE-KMS.