The COPY command ingests files stored in Azure Blob Storage directly into Amazon Redshift tables.
The COPY command bulk-loads CSV, JSON, Parquet, or Avro files sitting in Azure Blob Storage straight into a Redshift table. It parallelizes reads across Redshift nodes, making ingestion faster than INSERT loops.
Create an Azure AD application or shared access signature (SAS) that grants Blob Reader
rights. Supply its AZURE_ACCOUNT_KEY
or SAS_TOKEN
in the CREDENTIALS
clause of COPY.Make sure Redshift’s public subnet can reach .blob.core.windows.net
.
COPY target_table
FROM 'azure://<storage-acct>.blob.core.windows.net/<container>/path/'
CREDENTIALS (
AZURE_ACCOUNT_NAME '',
AZURE_ACCOUNT_KEY ''
)
FORMAT AS CSV
DELIMITER ','
IGNOREHEADER 1
TIMEFORMAT 'auto'
DATEFORMAT 'auto'
REGION 'us-east-1'
COMPUPDATE ON
STATUPDATE ON;
FORMAT chooses CSV, JSON, PARQUET, or AVRO. DELIMITER sets column separator for CSV. IGNOREHEADER skips header rows.COMPUPDATE and STATUPDATE update table compression and stats during load.
CREATE TABLE Customers(
id INT,
name VARCHAR(50),
email VARCHAR(70),
created_at TIMESTAMP
);COPY Customers
FROM 'azure://galaxy
data.blob.core.windows.net/ecomm/customers/'
CREDENTIALS (
AZURE_ACCOUNT_NAME 'galaxydata',
AZURE_ACCOUNT_KEY 'AZURE_KEY'
)
FORMAT AS CSV
DELIMITER ','
IGNOREHEADER 1;
Split files into 100-250 MB parts so Redshift can parallelize. Compress with gzip to cut transfer time.Disable STATUPDATE
during massive ingests, then run ANALYZE
afterward.
Wrong region endpoint stalls COPY; ensure .blob.core.windows.net
is reachable. Mismatched column count triggers Load error: Invalid field count
; validate delimiter and header rows.
.
No. All files in the path must share the format declared in the COPY statement.
No. COPY only reads data. You must remove or archive blobs separately if required.
Store keys or SAS tokens in AWS Secrets Manager and inject them into COPY via a stored procedure or parameterized script.