A dbt source is a YAML-defined reference to raw tables or views in an upstream database, enabling lineage, testing, and freshness tracking before transformation.
A dbt source is a metadata object that points to an existing raw table or view in your warehouse, letting dbt treat it as an upstream dependency for models, tests, and documentation.
Referencing sources centralizes connection details, enables automatic lineage graphs, and allows freshness tests—benefits that are lost when hard-coding table names in SQL.
Create a YAML file (e.g., src.yml
) under models/
and add:
version: 2
sources:
- name: stripe
database: raw
schema: payments
tables:
- name: charges
Teams use sources for ingest pipelines, third-party SaaS exports, and application replica tables where transformations must track back to raw data.
Add a loaded_at_field
and freshness
block to set acceptable lag.Running dbt source freshness
alerts if data is stale.
Define tests in the same YAML file:
- name: charges
tests:
- unique:
column_name: id
- not_null:
column_name: id
Group sources by system, keep YAML small, enforce freshness in CI, and document each field so analysts understand raw structures.
Galaxy’s AI SQL copilot autocompletes source()
references, surfaces table metadata, and lets teams endorse source-aligned queries so everyone uses the same raw objects.
1) Declare the YAML above.2) Run dbt run
; models referencing {{ source('stripe','charges') }}
now compile. 3) Trigger dbt docs generate
to visualize lineage.
.
dbt sources provide auditable lineage from raw data to refined models, making debugging faster and compliance reporting easier. Tracking freshness ensures stakeholders trust data recency, while central tests reduce silent schema drift. For analytics teams, mastering sources means reliable pipelines and clear ownership boundaries between ingestion and transformation layers.
Yes. Any existing relation—table or view—can be declared as a source as long as dbt has read access.
Incremental logic lives in downstream models. The source itself stays static, pointing to raw data.
Galaxy’s editor parses your sources
YAML and offers AI-generated source()
snippets as you type, reducing typos.
Absolutely. Use columns:
blocks under each table to attach not_null
, accepted_values
, or custom tests.