COPY ingests external files (CSV, JSON, Parquet, etc.) into Redshift tables in parallel for fast, scalable bulk loading.
COPY ingests data files stored outside the cluster and writes them into a target table using Redshift’s massively parallel architecture. It delivers far higher throughput than INSERT, supports automatic compression, and validates each row while loading.
COPY handles CSV, JSON, Parquet, Avro, ORC, and fixed-width text. It also accepts GZIP or BZIP2 compression.Pick the FORMAT option that matches your files.
Place source files in an S3 path that the cluster can reach. Attach an IAM role with s3:GetObject permission.If using credentials in the command, keep keys in AWS Secrets Manager.
Define columns in the same order as the file or use the CSV HEADER option to skip column names.
Specify the S3 URI, IAM role, and CSV settings.Use MAXERROR to control tolerance and COMPUPDATE ON to gather stats automatically.
When dozens of files exist in varying locations, generate a JSON manifest listing each object and pass MANIFEST to COPY. Redshift loads only the referenced files.
Add the JSON parameter pointing to a JSONPath file or ‘auto’.Redshift flattens the JSON into the table’s columns during load.
Stage data in equal-sized files (100 MB–1 GB) to maximize parallelism. Compress files to cut transfer time. Grant least-privilege IAM roles and log loads with STL_LOAD_COMMITS.
.
No. Stage the file in S3 first or use the Redshift Data API to upload small data sets.
Query STL_LOAD_COMMITS and STL_LOAD_ERRORS for row counts, error details, and load times. CloudWatch also captures COPY logs.
Yes. If any file fails, the entire operation rolls back unless you set ACCEPTINVCHARS or MAXERROR to allow errors.