A dbt model is a SQL file that transforms raw data into analysis-ready tables or views, orchestrated and version-controlled by dbt.
In dbt, a model is a standalone SQL file that compiles to a SELECT statement and materializes as a table, view, or incremental table in your warehouse. Models codify business logic, live in Git, and form the backbone of the dbt dependency graph.
During dbt run
, dbt resolves Jinja syntax, orders models using ref()
dependencies, and executes the resulting SQL.The materialized
config decides whether dbt builds a view, table, or incremental load.
Models enable version-controlled, testable transformations that replace brittle ad-hoc SQL. They ensure data lineage, support CI/CD, and encourage software-engineering practices in analytics.
View: fast builds, no storage cost. Table: persisted, faster downstream queries. Incremental: processes only new data for big tables.Ephemeral: inlined sub-queries to avoid temp objects.
1) Add a SQL file to models/
. 2) Write a SELECT transforming source tables. 3) Reference upstream models with {{ ref('model_name') }}
. 4) Configure materialization in dbt_project.yml
or with a Jinja block. 5) Run dbt run --select your_model
.
Use sources:
in schema.yml
to register raw tables.Refer to them with {{ source('app_db','orders') }}
.
Create stg_*
models that rename columns and apply light typing. These provide consistent contracts for downstream logic.
Layer business metrics, joins, and calculations atop staging models.Name files fct_*
, dim_*
, or int_*
for clarity.
Keep models idempotent, prefer narrow SELECTs, document with schema.yml
, and add tests (unique
, not_null
) to enforce data quality.
{{ config(materialized='incremental', unique_key='id') }}
SELECT *
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE order_date > (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
Yes.Galaxy’s desktop SQL editor understands Jinja syntax, offers context-aware autocompletion for ref()
and source()
, and lets teams share, review, and endorse dbt model queries without leaving the IDE.
Hard-coding database names breaks portability—use ref()
and source()
. Missing tests hides data drift—add built-in dbt tests. Running full refreshes on huge tables wastes time—switch to incremental materialization.
.
Data teams waste hours rewriting SQL across BI tools. dbt models centralize transformation logic in version-controlled files, making data pipelines reproducible and reviewable. They enable CI/CD, automated testing, and clear lineage, reducing errors and accelerating insight delivery.
Use views for lightweight transforms and rapid iteration; choose tables for performance-critical queries that read large datasets repeatedly.
Yes. Use Jinja, environment variables, or the var()
function to pass dynamic values into SQL.
Galaxy’s AI copilot autocompletes ref()
calls, suggests tests, and allows teams to endorse shared dbt queries, streamlining collaboration.
Built-in generic tests (unique
, not_null
), assertions via expectations
packages, and custom Python tests ensure data quality.