Move data from your local PostgreSQL server to Google BigQuery for scalable analytics.
BigQuery eliminates hardware maintenance, scales automatically, and lets analysts query terabytes in seconds. Migrating frees on-prem resources and unifies analytics.
Install pg_dump
, gsutil
, the bq
CLI, and create a Google Cloud project with a BigQuery dataset. Open outbound network ports on the PostgreSQL host.
pg_dump
for table-level exportsRun pg_dump -Fc -t Customers -t Orders -t OrderItems -t Products ecommerce > ecommerce.dump
. The custom format is compressed and resumable.
For multi-GB tables, use pg_dump -Fd -j 4 ecommerce -f ./dump_dir
to split output into chunks and leverage multiple CPU cores.
Create a bucket: gsutil mb gs://ecommerce-dumps
. Upload: gsutil cp ecommerce.dump gs://ecommerce-dumps/
. Use the -m
flag for parallel uploads on large files.
Extract CSV: pg_restore -t Customers -f Customers.csv ecommerce.dump
. Then load: bq load --autodetect ecommerce.Customers gs://ecommerce-dumps/Customers.csv
.
Write a shell loop to restore each table to CSV and call bq load
. For >1 TB, schedule a Transfer Service job pointing at the bucket.
Compare row counts: SELECT COUNT(*) FROM Customers;
in both systems. For data quality, checksum numeric columns or sample 1% of rows.
Add updated_at
columns and export only rows changed since the last run using WHERE updated_at > :last_sync
. Push new files to the same bucket and append in BigQuery.
Encoding mismatches break loads—always export in UTF-8 with --encoding=UTF8
. Timestamp precision loss occurs if you let autodetect pick STRING
; explicitly supply a schema or use JSONLines.
If data changes every minute and you only need read access, create a BigQuery external table via Cloud SQL federation instead of constant full loads.
Export → upload → load → validate. Automate each step with CI jobs or Cloud Functions and monitor failures in Stackdriver.
No, you can run pg_dump
with --snapshot
on PostgreSQL 13+ to avoid locks, then schedule a brief cut-over.
Yes. Use cron
, Cloud Functions, or Airflow to export changed rows and call bq load --replace=false
.
BigQuery is column-oriented and does not import PostgreSQL indexes. Rely on clustering and partitioning instead.