Move data, schema, and workload from MySQL into Amazon Redshift using DDL conversion, S3 staging, and COPY.
The process converts MySQL schema to Redshift-compatible DDL, stages data in Amazon S3, bulk-loads with COPY
, then validates and switches traffic.
Use mysqldump for DDL extraction, AWS Schema Conversion Tool (SCT) for type mapping, AWS Database Migration Service (DMS) or UNLOAD/COPY for data, and Galaxy for query validation.
Run SCT on the MySQL DDL. Review numeric and datetime types, replace AUTO_INCREMENT
with IDENTITY(1,1)
, and create the tables in Redshift.
CREATE TABLE customers (id INT IDENTITY(1,1), name VARCHAR(255), email VARCHAR(255), created_at TIMESTAMP);
Export each table to CSV or Parquet in S3. Example with the AWS CLI: mysqldump --tab=/tmp --fields-terminated-by='\t' ecommerce_db Orders
, then aws s3 cp /tmp/Orders.txt s3://ecom-bucket/stage/orders/
.
Use COPY
with IAM role, region, and format options. Split large tables into multiple files for parallelism.
COPY Orders FROM 's3://ecom-bucket/stage/orders/' IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopy' DELIMITER '\t' TIMEFORMAT 'auto';
Yes. Create an AWS DMS replication task in full-load + CDC mode. DMS loads existing rows, streams binlog changes, and minimizes downtime.
Count rows (SELECT COUNT(*)
) and checksum columns in both systems. For critical tables run spot queries in Galaxy and compare results.
1) Pause MySQL writes. 2) Let DMS apply final CDC lag. 3) Point reporting dashboards to Redshift. 4) Monitor query latency and data freshness.
Partition large tables in S3, compress with gzip
, set STATUPDATE ON
, and run ANALYZE
after every load to keep query plans optimal.
No. Redshift is based on PostgreSQL 8.0. Unsupported MySQL features (e.g., ENUM
, ON DUPLICATE KEY
) must be rewritten.
Use AWS DMS with full-load + CDC. Schedule the cut-over during low traffic and keep CDC running until zero lag.