How to Migrate from BigQuery to Redshift in PostgreSQL

Galaxy Glossary

How do I migrate data from BigQuery to Amazon Redshift efficiently?

Use UNLOAD, S3, and COPY to lift data from BigQuery and load it into Amazon Redshift quickly and reliably.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Why move data from BigQuery to Redshift?

Teams consolidate analytics, cut costs, or align with AWS tooling. Redshift integrates tightly with S3, Glue, and IAM—simplifying downstream reporting and ETL.

What are the high-level steps?

1) Export BigQuery tables to Google Cloud Storage (GCS). 2) Transfer those files to Amazon S3. 3) Create matching tables in Redshift. 4) COPY the files into Redshift. 5) Validate row counts and data types.

How do I export BigQuery tables?

Run EXPORT DATA to write compressed Parquet files. Parquet keeps schema and is column-oriented—ideal for Redshift.

Example BigQuery export

EXPORT DATA OPTIONS( uri='gs://bq_exports/customers_*.parquet', format='PARQUET', overwrite=true) AS SELECT * FROM `project.dataset.Customers`;

How do I move files from GCS to S3?

Use gsutil rsync + aws s3 cp, or a one-time Transfer Service. Ensure the AWS bucket is in the same region as your Redshift cluster to avoid egress fees.

How do I create compatible tables in Redshift?

Match column names, types, and sort/dist keys. Redshift is PostgreSQL-based, so VARCHAR length limits and TIMESTAMP precision matter.

Sample DDL for Customers

CREATE TABLE public.Customers ( id INT ENCODE az64, name VARCHAR(255), email VARCHAR(320), created_at TIMESTAMP ) DISTKEY(id) SORTKEY(id);

How do I load Parquet files into Redshift?

Use the COPY command with the FORMAT AS PARQUET option. Provide IAM credentials or a role that grants S3 READ.

Redshift COPY example

COPY public.Customers FROM 's3://bq_exports/customers_' IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole' FORMAT AS PARQUET;

How do I verify the migration?

Compare counts and checksums. Run SELECT COUNT(*) in both warehouses. For spot checks, compare max/min dates or hashes of primary keys.

What performance tweaks help?

Use proper dist keys (often customer_id) and sort keys (dates) on fact tables. Compress UTF-8 text with ZSTD encoding. Always load into staging tables first, then INSERT … SELECT into prod tables.

Optimize Orders load

CREATE TABLE staging.Orders LIKE public.Orders; COPY staging.Orders FROM 's3://bq_exports/orders_' IAM_ROLE 'arn:aws:iam::123:role/RedshiftCopyRole' FORMAT AS PARQUET; INSERT INTO public.Orders SELECT * FROM staging.Orders; ANALYZE public.Orders;

How to automate ongoing syncs?

Schedule BigQuery EXPORT DATA jobs and Redshift COPY via AWS Glue or Airflow. Use partitioned folders (e.g., /order_date=2024-05-31/) to load only new data.

Why How to Migrate from BigQuery to Redshift in PostgreSQL is important

How to Migrate from BigQuery to Redshift in PostgreSQL Example Usage


-- Migrate Products table
EXPORT DATA OPTIONS(
  uri='gs://bq_exports/Products_*.parquet',
  format='PARQUET')
AS SELECT id, name, price, stock FROM `project.dataset.Products`;

COPY public.Products
FROM 's3://bq_exports/Products_'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole'
FORMAT AS PARQUET;

How to Migrate from BigQuery to Redshift in PostgreSQL Syntax


-- BigQuery side
EXPORT DATA OPTIONS(
  uri='gs://bq_exports/Customers_*.parquet',
  format='PARQUET')
AS SELECT * FROM `project.dataset.Customers`;

-- Transfer GCS → S3 (CLI example)
gsutil rsync gs://bq_exports/ s3://bq_exports/

-- Redshift side
CREATE TABLE public.Customers (
  id INT,
  name VARCHAR(255),
  email VARCHAR(320),
  created_at TIMESTAMP
);

COPY public.Customers
FROM 's3://bq_exports/Customers_'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole'
FORMAT AS PARQUET;

Common Mistakes

Ignoring data type differences—BigQuery’s NUMERIC maps to Redshift DECIMAL. Failing to cast causes COPY errors. Always inspect schema and adjust lengths/precision.
Loading directly into production tables. If COPY fails, partial data pollutes reports. Instead, load into staging tables, validate, then insert into prod.