dbt snapshot

Galaxy Glossary

What is a dbt snapshot and how is it used for slowly-changing dimensions?

A dbt snapshot captures and versions source table rows over time, letting analysts build slowly-changing dimension tables directly in dbt.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

What Is dbt snapshot?

A dbt snapshot is an incremental, versioned copy of a source table that stores every historical change to each row, enabling slowly changing dimensions (SCD) inside your warehouse.

Why Use dbt snapshots for SCD?

Snapshots remove complex hand-written SQL by auto-generating surrogate keys, effective dates, and current row flags so analysts can time-travel facts with simple SELECTs.

How Does dbt snapshot Work Internally?

During a run, dbt compares the current source query to the last snapshot table.New or changed rows are upserted with dbt_valid_from and dbt_valid_to timestamps.

What Modes Can I Configure?

"check" mode hashes selected columns to detect changes. "timestamp" mode uses an updated_at column.Both support unique_key, strategy, and invalidate_hard_deletes options.

How Do I Create a dbt Snapshot Step-By-Step?

Define a .sql snapshot file, add a .yml config, run dbt snapshot, then reference the generated table in downstream models.

Code Example: Basic Snapshot YAML

snapshots:
- name: customers
target_schema: analytics
strategy: check
unique_key: id
check_cols: [name, email]

How To Query Snapshot Tables?

Join the snapshot to fact tables on surrogate_key or business key and filter on dbt_valid_from <= order_date < dbt_valid_to to get row state at order time.

Best Practices for dbt snapshots

Select only business keys and mutable columns, schedule snapshots after source loads, and partition large tables for faster diffing.

Common dbt snapshot Pitfalls

Hashing large JSON columns, forgetting unique_key uniqueness, and missing updated_at indexes slow snapshot runs.

How Does dbt snapshot Integrate With Galaxy?

Galaxy’s SQL editor autocompletes dbt_valid_from fields, lets teams endorse snapshot queries, and the AI copilot writes SCD filters in one click.

.

Why dbt snapshot is important

Snapshots centralize SCD logic inside the dbt DAG, eliminating custom ETL code and ensuring analysts query the same historical truth. Snapshots keep storage cheap by inserting only changed rows, which lowers warehouse costs while preserving full change history. Versioned data improves regulatory compliance and auditing by providing a tamper-proof record of how each field evolved over time.

dbt snapshot Example Usage


SELECT * FROM  WHERE dbt_valid_from <= '2023-12-31' AND dbt_valid_to > '2023-12-31';

dbt snapshot Syntax



Common Mistakes

Frequently Asked Questions (FAQs)

How often should I run dbt snapshots?

Run after each successful source load—daily for batch pipes or every few minutes for streaming inputs.

Can I backfill historical data?

Yes. Point the source query at older partitions and rerun dbt snapshot; dbt appends missing versions.

Do snapshots slow my warehouse?

Diffing large tables can be expensive. Use partitions, indexes, and column pruning to keep costs low.

How does Galaxy help with dbt snapshots?

Galaxy’s AI copilot suggests snapshot filters and autogenerates temporal join SQL, speeding up SCD analysis.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie
BauHealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.