Use UNLOAD, S3, and COPY to lift data from BigQuery and load it into Amazon Redshift quickly and reliably.
Teams consolidate analytics, cut costs, or align with AWS tooling. Redshift integrates tightly with S3, Glue, and IAM—simplifying downstream reporting and ETL.
1) Export BigQuery tables to Google Cloud Storage (GCS). 2) Transfer those files to Amazon S3. 3) Create matching tables in Redshift. 4) COPY the files into Redshift. 5) Validate row counts and data types.
Run EXPORT DATA
to write compressed Parquet files. Parquet keeps schema and is column-oriented—ideal for Redshift.
EXPORT DATA
OPTIONS(
uri='gs://bq_exports/customers_*.parquet',
format='PARQUET',
overwrite=true)
AS
SELECT * FROM `project.dataset.Customers`;
Use gsutil rsync
+ aws s3 cp
, or a one-time Transfer Service. Ensure the AWS bucket is in the same region as your Redshift cluster to avoid egress fees.
Match column names, types, and sort/dist keys. Redshift is PostgreSQL-based, so VARCHAR length limits and TIMESTAMP precision matter.
CREATE TABLE public.Customers (
id INT ENCODE az64,
name VARCHAR(255),
email VARCHAR(320),
created_at TIMESTAMP
) DISTKEY(id) SORTKEY(id);
Use the COPY
command with the FORMAT AS PARQUET
option. Provide IAM credentials or a role that grants S3 READ.
COPY public.Customers
FROM 's3://bq_exports/customers_'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole'
FORMAT AS PARQUET;
Compare counts and checksums. Run SELECT COUNT(*)
in both warehouses. For spot checks, compare max/min dates or hashes of primary keys.
Use proper dist keys (often customer_id
) and sort keys (dates) on fact tables. Compress UTF-8 text with ZSTD encoding. Always load into staging tables first, then INSERT … SELECT
into prod tables.
CREATE TABLE staging.Orders LIKE public.Orders;
COPY staging.Orders FROM 's3://bq_exports/orders_'
IAM_ROLE 'arn:aws:iam::123:role/RedshiftCopyRole'
FORMAT AS PARQUET;
INSERT INTO public.Orders
SELECT * FROM staging.Orders;
ANALYZE public.Orders;
Schedule BigQuery EXPORT DATA
jobs and Redshift COPY
via AWS Glue or Airflow. Use partitioned folders (e.g., /order_date=2024-05-31/
) to load only new data.
Yes, but Parquet is columnar and smaller, making COPY faster. With CSV you must specify delimiter, NULL AS, and explicit column list.
Add a last_updated
column in BigQuery, export only new partitions, and run an UPSERT (MERGE) in Redshift.