Move data and schema from a PostgreSQL source into Amazon Redshift with minimal downtime and full fidelity.
Teams outgrow single-instance Postgres when analytics queries slow OLTP workloads. Redshift’s columnar storage and MPP architecture scale reads without impacting the transactional database.
Create an Amazon S3 bucket, provision a Redshift cluster, and ensure network connectivity from Postgres to S3 or to an AWS DMS replication instance.
Run pg_dump --schema-only
and feed the output into the AWS Schema Conversion Tool (SCT). SCT flags unsupported data types (e.g., uuid
) and suggests Redshift-friendly equivalents such as VARCHAR(36)
.
1) Offline: UNLOAD
Postgres tables to CSV, upload to S3, then COPY
into Redshift. 2) Minimal-downtime: Use AWS DMS to replicate ongoing changes until cut-over.
Use the aws_s3
extension or psql piping:\copy (SELECT * FROM orders) TO PROGRAM 'gzip > orders.csv.gz' CSV
then aws s3 cp orders.csv.gz s3://my-bucket/
.
Grant the cluster an IAM role with S3 read access and run:COPY orders FROM 's3://my-bucket/orders.csv.gz' IAM_ROLE 'arn:aws:iam::123:role/redshiftCopy' GZIP CSV;
Configure AWS DMS with Postgres as the source and Redshift as the target. Enable “ongoing replication” to stream changes while users stay online.
Run row counts and checksums on critical tables:SELECT COUNT(*) FROM orders;
in both systems should match. Verify aggregates like total revenue.
After data parity checks pass, stop writes to Postgres, let DMS apply the last changes, then update downstream BI tools to read from Redshift.
Define DISTKEY
on high-cardinality join columns (e.g., customer_id
) and SORTKEY
on frequently filtered dates (e.g., order_date
). Schedule VACUUM
and ANALYZE
.
1) Forgetting to convert sequences; use Redshift IDENTITY
. 2) Loading data without compressing; always GZIP large exports.
No. Use AWS DMS for continuous replication and switch once lag is zero.
Yes. Specify table mappings in DMS or export only selected tables to S3.
Redshift handles petabytes. Split exports into ≤6.2 GB files for optimal parallel COPY.