How to Bulk Load Data in Amazon Redshift

Galaxy Glossary

How do I bulk load data into Amazon Redshift?

COPY ingests external files (CSV, JSON, Parquet, etc.) into Redshift tables in parallel for fast, scalable bulk loading.

Sign up for the latest in SQL knowledge from the Galaxy Team!

Description

What does the COPY command do in Redshift?

COPY ingests data files stored outside the cluster and writes them into a target table using Redshift’s massively parallel architecture. It delivers far higher throughput than INSERT, supports automatic compression, and validates each row while loading.

Which file formats can I load?

COPY handles CSV, JSON, Parquet, Avro, ORC, and fixed-width text. It also accepts GZIP or BZIP2 compression.Pick the FORMAT option that matches your files.

How do I prepare my S3 bucket?

Place source files in an S3 path that the cluster can reach. Attach an IAM role with s3:GetObject permission.If using credentials in the command, keep keys in AWS Secrets Manager.

How to bulk load a CSV file step-by-step?

Step 1 — Create the target table

Define columns in the same order as the file or use the CSV HEADER option to skip column names.

Step 2 — Run COPY

Specify the S3 URI, IAM role, and CSV settings.Use MAXERROR to control tolerance and COMPUPDATE ON to gather stats automatically.

Why choose MANIFEST for multiple files?

When dozens of files exist in varying locations, generate a JSON manifest listing each object and pass MANIFEST to COPY. Redshift loads only the referenced files.

How to load JSON data?

Add the JSON parameter pointing to a JSONPath file or ‘auto’.Redshift flattens the JSON into the table’s columns during load.

Best practices for reliable bulk loads

Stage data in equal-sized files (100 MB–1 GB) to maximize parallelism. Compress files to cut transfer time. Grant least-privilege IAM roles and log loads with STL_LOAD_COMMITS.

.

Why How to Bulk Load Data in Amazon Redshift is important

How to Bulk Load Data in Amazon Redshift Example Usage


-- Load today’s Orders from S3 into Redshift
aws s3 cp orders_2024_05_15.csv s3://galaxy-data/orders/

COPY Orders(id, customer_id, order_date, total_amount)
FROM 's3://galaxy-data/orders/orders_2024_05_15.csv'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
FORMAT AS CSV
DATEFORMAT 'auto'
TIMEFORMAT 'auto'
COMPUPDATE ON
STATUPDATE ON;

How to Bulk Load Data in Amazon Redshift Syntax


COPY target_table [(column_list)]
FROM 's3://bucket/prefix/'
CREDENTIALS 'aws_access_key_id=<id>;aws_secret_access_key=<key>'
-- or  IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
[REGION 'us-east-2']
FORMAT AS {CSV|JSON 'path'|PARQUET|AVRO|ORC}
[DELIMITER ',' ] [IGNOREHEADER 1]
[MANIFEST]
[COMPUPDATE {ON|OFF}]
[STATUPDATE {ON|OFF}]
[ACCEPTINVCHARS]
[DATEFORMAT 'auto']
[TIMEFORMAT 'auto']
[MAXERROR 0]
[TRUNCATECOLUMNS];

-- Example for the Customers table
COPY Customers(id, name, email, created_at)
FROM 's3://galaxy-data/customers/2024/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
FORMAT AS CSV
IGNOREHEADER 1
TIMEFORMAT 'auto'
COMPUPDATE ON;

Common Mistakes

Frequently Asked Questions (FAQs)

Can I load data directly from a local file?

No. Stage the file in S3 first or use the Redshift Data API to upload small data sets.

How do I monitor a COPY operation?

Query STL_LOAD_COMMITS and STL_LOAD_ERRORS for row counts, error details, and load times. CloudWatch also captures COPY logs.

Is COPY transactional?

Yes. If any file fails, the entire operation rolls back unless you set ACCEPTINVCHARS or MAXERROR to allow errors.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie
BauHealth Logo
Truvideo Logo