Staging Fivetran Data into Snowflake

Galaxy Glossary

How do I stage data from Fivetran into Snowflake?

Staging Fivetran data in Snowflake involves configuring secure pipelines that land raw extracts from diverse sources into Snowflake schemas, allowing analytics teams to query and transform the data efficiently.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

Definition

Staging data from Fivetran into Snowflake is the process whereby Fivetran-managed connectors extract data from source systems, land the files in an intermediate cloud storage bucket, and then load them—via Snowpipe—into Snowflake tables that live in a dedicated <source>_RAW schema. The staged tables preserve the original structure and change data, providing a reliable, auditable foundation for downstream modeling.

Why It Matters

Modern data stacks rely on automated ingestion and scalable cloud warehouses. Fivetran eliminates EL (extract-load) engineering work, while Snowflake offers elastic storage and compute. Together they enable:

  • Zero-maintenance ingestion — connectors automatically handle schema drift, incremental loads, and backfills.
  • Near-real-time analytics — Snowpipe streams arrivals into tables within minutes.
  • Governance & auditing — Time-Travel and Fail-Safe let teams recover or inspect earlier states of the raw data.

How Fivetran Loads Data into Snowflake

1. Extraction

Fivetran connectors query the source (e.g., MySQL, Salesforce, Stripe) and write JSON or CSV files to a Fivetran-managed S3, GCS, or Azure Blob bucket.

2. Staging

Fivetran creates an EXTERNAL STAGE in Snowflake that points to the bucket and sets up an AUTO_INGEST Snowpipe. A service managed by Fivetran posts cloud-storage events so Snowflake knows when to ingest new files.

3. Loading

Snowpipe executes COPY INTO commands that stream the data into raw tables inside a schema such as FIVETRAN_DB.SALESFORCE_RAW. Fivetran adds metadata columns (e.g., _FIVETRAN_SYNCED, _FIVETRAN_DELETED).

4. Post-Processing

Optional dbt jobs or Snowflake Tasks model the staged data into analytics-ready marts.

Step-by-Step Setup

  1. Create a Destination in Fivetran and choose Snowflake.
  2. In Snowflake create a role, user, and database for Fivetran:CREATE ROLE FIVETRAN;
    GRANT ROLE FIVETRAN TO USER fivetran_user;
    CREATE DATABASE FIVETRAN_DB;
    GRANT ALL ON DATABASE FIVETRAN_DB TO ROLE FIVETRAN;
  3. Provide the account URL, warehouse, user, and RSA public key to Fivetran.
  4. Configure connectors; each will autocreate its own schema (SHOPIFY_RAW, POSTGRES_RAW, etc.).
  5. Fivetran spins up the cloud storage bucket, stage, and Snowpipe. No Snowflake admin action is needed beyond initial grants.

Inside the RAW Schema

Key characteristics:

  • One-to-one table mapping — source tables/objects become Snowflake tables.
  • Soft deletes — rows removed in the source are flagged by _FIVETRAN_DELETED = TRUE.
  • Incremental columns such as _FIVETRAN_SYNCED mark the last successful load.
  • Primary keys are replicated to _FIVETRAN_ID if the original table lacks one.

Best Practices

Use Dedicated Warehouses

Point Fivetran at a small, cost-effective warehouse (e.g., XS) since loads are frequent but lightweight.

Isolate Permissions

Grant USAGE and CREATE only on the dedicated database. Analysts should read from modeled schemas, not RAW.

Tag and Monitor Costs

Apply TAGS on the Fivetran warehouse and pipe to track credit consumption. Use ACCOUNT_USAGE.PIPE_USAGE_HISTORY for monitoring.

Time-Travel Window

The RAW data is immutable; a 1- to 3-day DATA_RETENTION_TIME_IN_DAYS usually suffices.

Working with Staged Data

To explore the latest state of a table:

SELECT *
FROM FIVETRAN_DB.SALESFORCE_RAW.OPPORTUNITY
WHERE _FIVETRAN_DELETED = FALSE
ORDER BY _FIVETRAN_SYNCED DESC
LIMIT 100;

For slowly changing dimensions, leverage Snowflake streams:

CREATE OR REPLACE STREAM OPPORTUNITY_CHANGES ON TABLE FIVETRAN_DB.SALESFORCE_RAW.OPPORTUNITY;

Common Mistakes & How to Fix Them

1. Querying RAW in Dashboards

The RAW schema contains soft deletes and subtly changing column types. Always transform into a clean ANALYTICS layer (dbt is ideal).

2. Over-granting Privileges

Giving analysts OWNERSHIP on RAW objects lets them accidentally drop tables, breaking the connector. Stick to SELECT.

3. Ignoring Warehouse Auto-Suspend

Leaving the Fivetran warehouse running 24/7 wastes credits. Set AUTO_SUSPEND = 60 and AUTO_RESUME = TRUE.

Where Galaxy Fits In

Once data is staged, engineers need to discover column names, write tests, and iterate on transforms. Galaxy’s lightning-fast SQL editor and context-aware AI copilot accelerate this workflow:

  • Auto-completes SALESFORCE_RAW table metadata as soon as Fivetran creates it.
  • Suggests joins and filter predicates by reading column statistics.
  • Collections let analytics engineers share vetted staging queries instead of pasting samples in Slack.

Conclusion

Staging data from Fivetran into Snowflake replaces brittle in-house pipelines with a fully managed, secure, and near-real-time system. By following least-privilege roles, warehouse sizing best practices, and modeling raw tables into clean marts, teams unlock rapid analytics while controlling cost. Modern editors like Galaxy further streamline query development and collaboration on top of the staged data.

Why Staging Fivetran Data into Snowflake is important

Fivetran plus Snowflake removes the EL burden and enables near-real-time analytics. Understanding how data lands in the RAW schema, what metadata columns mean, and how to secure and monitor the pipeline ensures reliability, cost control, and accurate downstream reporting. Without this knowledge, teams risk corrupted data models, runaway warehouse spend, and compliance gaps.

Staging Fivetran Data into Snowflake Example Usage



Common Mistakes

Frequently Asked Questions (FAQs)

How often does Fivetran load data into Snowflake?

The default sync frequency is 15 minutes, but many connectors support 5-minute intervals or real-time CDC. You can adjust the schedule per connector in the Fivetran UI.

Does Fivetran use Snowflake stages and Snowpipe?

Yes. Fivetran creates an EXTERNAL STAGE that points to its cloud bucket and an AUTO_INGEST Snowpipe to stream files into tables.

How can I reduce Snowflake costs caused by Fivetran loads?

Assign Fivetran a small warehouse with AUTO_SUSPEND, monitor PIPE_USAGE_HISTORY, and limit DATA_RETENTION_TIME_IN_DAYS in RAW schemas.

How does Galaxy help when working with Fivetran-staged data?

Galaxy auto-discovers the new RAW schemas, offers AI-driven autocomplete, and lets teams share vetted staging queries in Collections, streamlining exploration and transformation.

Want to learn about other SQL terms?