dbt snapshot

What is a dbt snapshot and how is it used for slowly-changing dimensions?

A dbt snapshot captures and versions source table rows over time, letting analysts build slowly-changing dimension tables directly in dbt.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

What Is dbt snapshot?

A dbt snapshot is an incremental, versioned copy of a source table that stores every historical change to each row, enabling slowly changing dimensions (SCD) inside your warehouse.

Why Use dbt snapshots for SCD?

Snapshots remove complex hand-written SQL by auto-generating surrogate keys, effective dates, and current row flags so analysts can time-travel facts with simple SELECTs.

How Does dbt snapshot Work Internally?

During a run, dbt compares the current source query to the last snapshot table.New or changed rows are upserted with dbt_valid_from and dbt_valid_to timestamps.

What Modes Can I Configure?

"check" mode hashes selected columns to detect changes. "timestamp" mode uses an updated_at column.Both support unique_key, strategy, and invalidate_hard_deletes options.

How Do I Create a dbt Snapshot Step-By-Step?

Define a .sql snapshot file, add a .yml config, run dbt snapshot, then reference the generated table in downstream models.

Code Example: Basic Snapshot YAML

snapshots: - name: customers target_schema: analytics strategy: check unique_key: id check_cols: [name, email]

How To Query Snapshot Tables?

Join the snapshot to fact tables on surrogate_key or business key and filter on dbt_valid_from <= order_date < dbt_valid_to to get row state at order time.

Best Practices for dbt snapshots

Select only business keys and mutable columns, schedule snapshots after source loads, and partition large tables for faster diffing.

Common dbt snapshot Pitfalls

Hashing large JSON columns, forgetting unique_key uniqueness, and missing updated_at indexes slow snapshot runs.

How Does dbt snapshot Integrate With Galaxy?

Galaxy’s SQL editor autocompletes dbt_valid_from fields, lets teams endorse snapshot queries, and the AI copilot writes SCD filters in one click.

Why dbt snapshot is important

Snapshots centralize SCD logic inside the dbt DAG, eliminating custom ETL code and ensuring analysts query the same historical truth. Snapshots keep storage cheap by inserting only changed rows, which lowers warehouse costs while preserving full change history. Versioned data improves regulatory compliance and auditing by providing a tamper-proof record of how each field evolved over time.

dbt snapshot Example Usage


SELECT * FROM  WHERE dbt_valid_from <= '2023-12-31' AND dbt_valid_to > '2023-12-31';

dbt snapshot Syntax

Common Mistakes

Mistake: Omitting a stable unique_key causes duplicate history rows. Fix by choosing a natural business key or adding a surrogate key column.
Mistake: Hashing every column in check mode balloons runtime. Fix by limiting check_cols to fields that actually mutate.
Mistake: Running snapshots before sources finish loading yields partial history. Fix by scheduling the dbt snapshot job after ETL completes.