Exports Snowflake data, transfers it to Google Cloud Storage, then loads it into BigQuery while recreating schemas and preserving data types.
Lower cost, deeper GCP integration, and BigQuery’s serverless performance often motivate teams to switch.
Option 1: Snowflake → GCS (COPY INTO) → BigQuery (bq load). Option 2: Use BigQuery Migration Service for automatic transfers.Option 3: Fivetran, Airbyte, or Dataflow for near-zero-downtime replication.
1 – CREATE STAGE pointing to a GCS bucket.
2 – COPY INTO stage in parallel, compressing files.
3 – LIST to verify objects.
Issue bq load or schedule BigQuery Data Transfer Service.Define explicit schemas to avoid numeric/string drift.
Generate DDL with SHOW CREATE TABLE in Snowflake, convert data types (e.g., NUMBER → NUMERIC, VARIANT → JSON) using a simple Python script or Data Migration Service.
Add LAST_MODIFIED timestamp columns and schedule hourly COPY INTO with PATTERN filter. Use BigQuery’s MERGE to upsert.
Snowflake COPY INTO writes CSV.gz to @gcs_stage/customers/.BigQuery bq load ingests those into my_dataset.Customers, mapping id → INT64, created_at → DATETIME.
Re-implement primary-key and unique constraints as NOT ENFORCED in BigQuery, or use dbt tests to validate.
Compare SELECT COUNT(*) results between Snowflake and BigQuery.Use EXCEPT and CRC32 hashes for spot checks.
Run dual-writes, backfill lagging rows, freeze Snowflake, rerun final incremental, switch apps to BigQuery.
Compress exports (GZIP), partition large tables by date in BigQuery, script every step with CI, and tag datasets with migration date.
Skipping type mapping causes STRING blobs. Forgetting to set --field_delimiter="|" creates broken rows.
Automate, verify, and monitor each stage to ensure a predictable, lossless migration.
.
Yes. BigQuery Migration Service automates Snowflake exports, schema conversion, and incremental syncs.
With parallel COPY INTO (16 threads) and gsutil -m, expect ~45 minutes export and ~30 minutes load, network permitting.
Absolutely. Freeze the Snowflake warehouse in suspended mode to retain historical data at minimal cost.