How to import CSV in Redshift

How do I import a CSV file into Amazon Redshift?

Use the COPY command to load CSV files from Amazon S3 (or other sources) into Redshift tables quickly and in parallel.

Welcome to the Galaxy, Guardian!

Oops! Something went wrong while submitting the form.

Description

Why choose COPY over INSERT for CSV loads?

COPY ingests data in parallel from multiple files, skips client-side bottlenecks, and automatically commits in batches. INSERT loops are slow and prone to timeouts.

What prerequisites must I set up?

Create the destination table, upload the CSV files to S3, and grant Redshift read access via an IAM role or AWS access keys. Confirm file encoding and delimiter.

How does the basic COPY syntax look?

Use COPY schema.table FROM 's3://bucket/prefix/' IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole' CSV;.Adjust options for header rows, NULL handling, and compression.

Which options matter for ecommerce data?

DATEFORMAT parses order_date, TIMEFORMAT covers timestamps, ACCEPTINVCHARS lets you keep emoji in product names, and REGION reduces cross-region data transfer.

How can I load multiple CSVs at once?

Point COPY at an S3 prefix rather than a single file.Redshift spawns one slice per file, so split large exports into many 100–250 MB files for best throughput.

What’s a safe staging workflow?

Load CSVs into a staging table, run validation queries, then INSERT INTO production tables inside a single transaction. Roll back if row counts or checksums don’t match expectations.

How do I validate a successful load?

Compare STL_LOAD_ERRORS, count rows, and verify sample records.Automate checks in a CI pipeline or Airflow DAG for nightly imports.

Example validation snippet

SELECT err_reason, COUNT(*) FROM stl_load_errors WHERE filename LIKE '%customers%' GROUP BY err_reason;

Best practices for durable loads

1) Compress CSVs with GZIP to cut S3 costs. 2) Use MANIFEST files for strict file lists. 3) Enable STATUPDATE so queries stay fast.4) Archive processed files to an "_loaded/" folder.

When should I VACUUM and ANALYZE?

COPY into an empty table skips vacuuming, but frequent incremental loads need VACUUM DELETE ONLY and ANALYZE on affected tables to keep query plans optimal.

Why How to import CSV in Redshift is important

How to import CSV in Redshift Example Usage


-- Load customer orders from CSV into staging_orders
COPY staging.orders
FROM 's3://ecom-data/exports/orders/2023-10-01/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
CSV
IGNOREHEADER 1
DATEFORMAT 'auto'
TIMEFORMAT 'auto'
GZIP
STATUPDATE ON;

How to import CSV in Redshift Syntax


COPY <schema>.<table>
FROM 's3://<bucket>/<prefix>/'
IAM_ROLE 'arn:aws:iam::<account-id>:role/<role-name>'
[CSV | DELIMITER ',' ]
[IGNOREHEADER <n>]
[DATEFORMAT 'auto']
[TIMEFORMAT 'auto']
[NULL AS '<null-string>']
[ACCEPTINVCHARS]
[GZIP | BZIP2]
[REGION '<aws-region>']
[STATUPDATE ON]
[MANIFEST];

Common Mistakes

Using a single large CSV (≫1 GB). Redshift can’t split one file across slices, so the load runs on one node. Fix by splitting into many 100–250 MB chunks.
Granting bucket access with access keys, then rotating them. COPY will fail silently when keys expire. Use an IAM role attached to the cluster instead.

Frequently Asked Questions (FAQs)

Can I load CSVs directly from my local machine?

No. Upload the files to S3 (or an Amazon DataShare, HTTP endpoint, or DynamoDB) first. Redshift reads from cloud storage only.

How do I handle commas inside quoted strings?

COPY parses standard CSV quoting by default. Ensure your export wraps fields with ". Use ESCAPE if backslashes appear in data.

Is COPY idempotent?

Only if you manage it. Staging tables plus primary keys let you deduplicate. Alternatively, use MANIFEST and move processed files to a different prefix.