Move data, schema, and workloads from an on-premise PostgreSQL cluster to Snowflake’s cloud data platform using staged files, COPY INTO, and validation steps.
It consists of exporting on-prem tables, staging the files, recreating schema in Snowflake, bulk-loading with COPY INTO
, validating row counts, and switching your applications to the new endpoint.
1️⃣ Assess schema & extensions.
2️⃣ Export data to CSV/Parquet with COPY
.
3️⃣ Upload files to an internal or external Snowflake stage.
4️⃣ Recreate tables in Snowflake.
5️⃣ Load data with COPY INTO
.
6️⃣ Re-create constraints, views, and grants.
7️⃣ Run validation queries.
8️⃣ Cut over traffic and monitor.
Run psql -c "\COPY Orders TO '/tmp/orders.csv' CSV HEADER"
on the on-prem server.Repeat for Customers
, Products
, and OrderItems
.
In Snowflake, run CREATE OR REPLACE STAGE ecommerce_stage;
for an internal stage or configure an external S3/GCS/Azure stage.
Use COPY INTO
: COPY INTO Orders FROM @ecommerce_stage/orders.csv FILE_FORMAT=(TYPE=CSV SKIP_HEADER=1);
Export the initial full load, switch on logical replication (e.g., Debezium ➜ Kafka ➜ Snowpipe) for catch-up changes, then finalize.
Compare row counts: SELECT 'orders' AS table, COUNT(*) FROM Orders UNION ALL SELECT 'orders', COUNT(*) FROM postgres.public.orders@onprem_db;
• Export to compressed Parquet to reduce load time.
• Match column collations & time zones.
• Use multi-threaded SNCOPY
or Snowpipe for large files.
• Automate validation with dbt or pytest.
Cut over after validation passes, replication lag is zero, and connection strings are updated in a feature flag or YAML config for easy rollback.
.
Yes. Do a full export, enable change data capture via logical replication, stream deltas into Snowflake with Snowpipe, then switch once lag is zero.
No. Re-create sequences as Snowflake SEQUENCE
objects and set default values to nextval()
equivalents.
Convert geometries to WKT or WKB during export, then load into Snowflake VARIANT or GEOGRAPHY columns and rebuild spatial indexes.