How to Migrate from MySQL to Redshift

Galaxy Glossary

How do I migrate a MySQL database to Amazon Redshift?

Move data, schema, and workload from MySQL into Amazon Redshift using DDL conversion, S3 staging, and COPY.

Sign up for the latest in SQL knowledge from the Galaxy Team!

Description

What does a MySQL → Redshift migration involve?

The process converts MySQL schema to Redshift-compatible DDL, stages data in Amazon S3, bulk-loads with COPY, then validates and switches traffic.

Which tools should I use?

Use mysqldump for DDL extraction, AWS Schema Conversion Tool (SCT) for type mapping, AWS Database Migration Service (DMS) or UNLOAD/COPY for data, and Galaxy for query validation.

How do I prepare the Redshift schema?

Run SCT on the MySQL DDL. Review numeric and datetime types, replace AUTO_INCREMENT with IDENTITY(1,1), and create the tables in Redshift.

Example DDL conversion

CREATE TABLE customers (id INT IDENTITY(1,1), name VARCHAR(255), email VARCHAR(255), created_at TIMESTAMP);

How do I export data from MySQL?

Export each table to CSV or Parquet in S3. Example with the AWS CLI: mysqldump --tab=/tmp --fields-terminated-by='\t' ecommerce_db Orders, then aws s3 cp /tmp/Orders.txt s3://ecom-bucket/stage/orders/.

How do I bulk-load into Redshift?

Use COPY with IAM role, region, and format options. Split large tables into multiple files for parallelism.

COPY example for Orders

COPY Orders FROM 's3://ecom-bucket/stage/orders/' IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopy' DELIMITER '\t' TIMEFORMAT 'auto';

Can I migrate continuously?

Yes. Create an AWS DMS replication task in full-load + CDC mode. DMS loads existing rows, streams binlog changes, and minimizes downtime.

How do I validate the migration?

Count rows (SELECT COUNT(*)) and checksum columns in both systems. For critical tables run spot queries in Galaxy and compare results.

What is the cut-over checklist?

1) Pause MySQL writes. 2) Let DMS apply final CDC lag. 3) Point reporting dashboards to Redshift. 4) Monitor query latency and data freshness.

Best practices for smooth migration

Partition large tables in S3, compress with gzip, set STATUPDATE ON, and run ANALYZE after every load to keep query plans optimal.

Why How to Migrate from MySQL to Redshift is important

How to Migrate from MySQL to Redshift Example Usage


-- Load incremental OrderItems after full export
COPY OrderItems FROM 's3://ecom-bucket/stage/orderitems/2024-06-28/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopy'
FORMAT AS PARQUET;

How to Migrate from MySQL to Redshift Syntax


-- 1. Extract schema from MySQL
a) mysqldump --no-data -u user -p ecommerce_db > ddl.sql
b) Run AWS SCT to convert ddl.sql to Redshift DDL

-- 2. Create tables in Redshift
CREATE TABLE Customers (
  id INT IDENTITY(1,1) PRIMARY KEY,
  name VARCHAR(255),
  email VARCHAR(255),
  created_at TIMESTAMP
);

-- 3. Export data from MySQL to S3
mysqldump --tab=/tmp --fields-terminated-by='\t' ecommerce_db Customers
aws s3 cp /tmp/Customers.txt s3://ecom-bucket/stage/customers/

-- 4. Load data into Redshift
COPY Customers
FROM 's3://ecom-bucket/stage/customers/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopy'
DELIMITER '\t'
TIMEFORMAT 'auto';

-- 5. Continuous replication (optional)
aws dms create-replication-task \
  --migration-type full-load-and-cdc \
  --table-mappings file://mapping.json \
  --replication-task-settings file://settings.json;

Common Mistakes

Relying on <code>INSERT</code> statements instead of <code>COPY</code>. INSERT loads rows serially and slows down large tables. Always stage files in S3 and use COPY for parallel ingestion.
Ignoring sort and distribution keys. Without them queries may scan entire tables. Choose DISTKEY on join columns (e.g., customer_id) and SORTKEY on date columns (e.g., order_date).

Frequently Asked Questions (FAQs)

Is Redshift fully compatible with MySQL?

No. Redshift is based on PostgreSQL 8.0. Unsupported MySQL features (e.g., ENUM, ON DUPLICATE KEY) must be rewritten.

Can I migrate without downtime?

Use AWS DMS with full-load + CDC. Schedule the cut-over during low traffic and keep CDC running until zero lag.

Want to learn about other SQL terms?

dbt tests

dbt tests are declarative assertions in dbt that automatically validate data quality during model builds.

dbt utils

dbt utils is a community-maintained dbt package that adds time-saving utility macros for analytics engineers, such as generating surrogate keys, safely unioning models, and automating date spines.

dbt Tutorial Glossary: Build, Test & Deploy SQL Models

dbt is an open-source framework that lets data teams transform raw warehouse tables into tested, documented, production-ready datasets using version-controlled SQL.

Trusted by top engineers on high-velocity teams

Assort Health

Curri

Rubie