How to Migrate from BigQuery to Redshift in PostgreSQL

Galaxy Glossary

How do I migrate data from BigQuery to Amazon Redshift efficiently?

Use UNLOAD, S3, and COPY to lift data from BigQuery and load it into Amazon Redshift quickly and reliably.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Why move data from BigQuery to Redshift?

Teams consolidate analytics, cut costs, or align with AWS tooling. Redshift integrates tightly with S3, Glue, and IAM—simplifying downstream reporting and ETL.

What are the high-level steps?

1) Export BigQuery tables to Google Cloud Storage (GCS). 2) Transfer those files to Amazon S3. 3) Create matching tables in Redshift. 4) COPY the files into Redshift. 5) Validate row counts and data types.

How do I export BigQuery tables?

Run EXPORT DATA to write compressed Parquet files. Parquet keeps schema and is column-oriented—ideal for Redshift.

Example BigQuery export

EXPORT DATA
OPTIONS(
uri='gs://bq_exports/customers_*.parquet',
format='PARQUET',
overwrite=true)
AS
SELECT * FROM `project.dataset.Customers`;

How do I move files from GCS to S3?

Use gsutil rsync + aws s3 cp, or a one-time Transfer Service. Ensure the AWS bucket is in the same region as your Redshift cluster to avoid egress fees.

How do I create compatible tables in Redshift?

Match column names, types, and sort/dist keys. Redshift is PostgreSQL-based, so VARCHAR length limits and TIMESTAMP precision matter.

Sample DDL for Customers

CREATE TABLE public.Customers (
id INT ENCODE az64,
name VARCHAR(255),
email VARCHAR(320),
created_at TIMESTAMP
) DISTKEY(id) SORTKEY(id);

How do I load Parquet files into Redshift?

Use the COPY command with the FORMAT AS PARQUET option. Provide IAM credentials or a role that grants S3 READ.

Redshift COPY example

COPY public.Customers
FROM 's3://bq_exports/customers_'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole'
FORMAT AS PARQUET;

How do I verify the migration?

Compare counts and checksums. Run SELECT COUNT(*) in both warehouses. For spot checks, compare max/min dates or hashes of primary keys.

What performance tweaks help?

Use proper dist keys (often customer_id) and sort keys (dates) on fact tables. Compress UTF-8 text with ZSTD encoding. Always load into staging tables first, then INSERT … SELECT into prod tables.

Optimize Orders load

CREATE TABLE staging.Orders LIKE public.Orders;
COPY staging.Orders FROM 's3://bq_exports/orders_'
IAM_ROLE 'arn:aws:iam::123:role/RedshiftCopyRole'
FORMAT AS PARQUET;

INSERT INTO public.Orders
SELECT * FROM staging.Orders;
ANALYZE public.Orders;

How to automate ongoing syncs?

Schedule BigQuery EXPORT DATA jobs and Redshift COPY via AWS Glue or Airflow. Use partitioned folders (e.g., /order_date=2024-05-31/) to load only new data.

Why How to Migrate from BigQuery to Redshift in PostgreSQL is important

How to Migrate from BigQuery to Redshift in PostgreSQL Example Usage


-- Migrate Products table
EXPORT DATA OPTIONS(
  uri='gs://bq_exports/Products_*.parquet',
  format='PARQUET')
AS SELECT id, name, price, stock FROM `project.dataset.Products`;

COPY public.Products
FROM 's3://bq_exports/Products_'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole'
FORMAT AS PARQUET;

How to Migrate from BigQuery to Redshift in PostgreSQL Syntax


-- BigQuery side
EXPORT DATA OPTIONS(
  uri='gs://bq_exports/Customers_*.parquet',
  format='PARQUET')
AS SELECT * FROM `project.dataset.Customers`;

-- Transfer GCS → S3 (CLI example)
gsutil rsync gs://bq_exports/ s3://bq_exports/

-- Redshift side
CREATE TABLE public.Customers (
  id INT,
  name VARCHAR(255),
  email VARCHAR(320),
  created_at TIMESTAMP
);

COPY public.Customers
FROM 's3://bq_exports/Customers_'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole'
FORMAT AS PARQUET;

Common Mistakes

Frequently Asked Questions (FAQs)

Can I load CSV instead of Parquet?

Yes, but Parquet is columnar and smaller, making COPY faster. With CSV you must specify delimiter, NULL AS, and explicit column list.

How do I handle incremental updates?

Add a last_updated column in BigQuery, export only new partitions, and run an UPSERT (MERGE) in Redshift.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.