Use the COPY command to load CSV files from Amazon S3 (or other sources) into Redshift tables quickly and in parallel.
COPY ingests data in parallel from multiple files, skips client-side bottlenecks, and automatically commits in batches. INSERT loops are slow and prone to timeouts.
Create the destination table, upload the CSV files to S3, and grant Redshift read access via an IAM role or AWS access keys. Confirm file encoding and delimiter.
Use COPY schema.table FROM 's3://bucket/prefix/' IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole' CSV;
.Adjust options for header rows, NULL handling, and compression.
DATEFORMAT
parses order_date, TIMEFORMAT
covers timestamps, ACCEPTINVCHARS
lets you keep emoji in product names, and REGION
reduces cross-region data transfer.
Point COPY at an S3 prefix rather than a single file.Redshift spawns one slice per file, so split large exports into many 100–250 MB files for best throughput.
Load CSVs into a staging table, run validation queries, then INSERT INTO
production tables inside a single transaction. Roll back if row counts or checksums don’t match expectations.
Compare STL_LOAD_ERRORS
, count rows, and verify sample records.Automate checks in a CI pipeline or Airflow DAG for nightly imports.
SELECT err_reason, COUNT(*) FROM stl_load_errors WHERE filename LIKE '%customers%' GROUP BY err_reason;
1) Compress CSVs with GZIP to cut S3 costs. 2) Use MANIFEST
files for strict file lists. 3) Enable STATUPDATE
so queries stay fast.4) Archive processed files to an "_loaded/" folder.
COPY into an empty table skips vacuuming, but frequent incremental loads need VACUUM DELETE ONLY
and ANALYZE
on affected tables to keep query plans optimal.
.
No. Upload the files to S3 (or an Amazon DataShare, HTTP endpoint, or DynamoDB) first. Redshift reads from cloud storage only.
COPY parses standard CSV quoting by default. Ensure your export wraps fields with "
. Use ESCAPE
if backslashes appear in data.
Only if you manage it. Staging tables plus primary keys let you deduplicate. Alternatively, use MANIFEST
and move processed files to a different prefix.