A dbt snapshot captures and versions source table rows over time, letting analysts build slowly-changing dimension tables directly in dbt.
A dbt snapshot is an incremental, versioned copy of a source table that stores every historical change to each row, enabling slowly changing dimensions (SCD) inside your warehouse.
Snapshots remove complex hand-written SQL by auto-generating surrogate keys, effective dates, and current row flags so analysts can time-travel facts with simple SELECTs.
During a run, dbt compares the current source query to the last snapshot table.New or changed rows are upserted with dbt_valid_from and dbt_valid_to timestamps.
"check" mode hashes selected columns to detect changes. "timestamp" mode uses an updated_at column.Both support unique_key, strategy, and invalidate_hard_deletes options.
Define a .sql snapshot file, add a .yml config, run dbt snapshot
, then reference the generated table in downstream models.
snapshots:
- name: customers
target_schema: analytics
strategy: check
unique_key: id
check_cols: [name, email]
Join the snapshot to fact tables on surrogate_key or business key and filter on dbt_valid_from <= order_date < dbt_valid_to
to get row state at order time.
Select only business keys and mutable columns, schedule snapshots after source loads, and partition large tables for faster diffing.
Hashing large JSON columns, forgetting unique_key uniqueness, and missing updated_at indexes slow snapshot runs.
Galaxy’s SQL editor autocompletes dbt_valid_from
fields, lets teams endorse snapshot queries, and the AI copilot writes SCD filters in one click.
.
Snapshots centralize SCD logic inside the dbt DAG, eliminating custom ETL code and ensuring analysts query the same historical truth. Snapshots keep storage cheap by inserting only changed rows, which lowers warehouse costs while preserving full change history. Versioned data improves regulatory compliance and auditing by providing a tamper-proof record of how each field evolved over time.
Run after each successful source load—daily for batch pipes or every few minutes for streaming inputs.
Yes. Point the source query at older partitions and rerun dbt snapshot
; dbt appends missing versions.
Diffing large tables can be expensive. Use partitions, indexes, and column pruning to keep costs low.
Galaxy’s AI copilot suggests snapshot filters and autogenerates temporal join SQL, speeding up SCD analysis.