In dbt, a model is a SQL or Python file that defines a single SELECT query whose result becomes a view, table, or other materialized object in the warehouse.
In dbt (data build tool), a model is a .sql
or .py
file that contains the business logic used to transform raw data into clean, analytics-ready datasets. Each model compiles to a single SELECT
statement and can be materialized as a view, table, incremental table, or ephemeral CTE.
Modern analytics stacks separate data movement (ELT) from transformation. dbt owns the T layer, and models are its fundamental building block. Because models are:
ref
-ing other models.Located in your models/
directory, a model typically contains:
-- models/orders.sql
{{config(materialized='table')}}
SELECT *
FROM {{ ref('raw_orders') }}
The {{config}}
block controls how the query result is stored:
ref()
Instead of hard-coding physical table names, you call ref('model_name')
. dbt builds a dependency graph, orders execution, and renames objects safely through environment changes.
Use patterns such as staging → intermediate → mart to keep logic readable and reusable.
One model = one transformation concern. If a query exceeds ~200 lines or mixes domains, split it up.
Add unique
, not_null
, and accepted_values
tests in YAML. Document both columns and models so analysts understand provenance.
Large fact tables benefit from incremental loads. Dimension tables change less; stick to full-refresh tables or views for transparency.
-- models/stg_orders.sql
SELECT
id AS order_id,
user_id,
status,
amount_cents / 100.0 AS order_total,
created_at::date AS order_date
FROM {{ source('stripe','orders_raw') }}
-- models/fct_revenue.sql
{{config(materialized='incremental')}}
WITH orders AS (
SELECT *
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE order_date > (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
)
SELECT
order_date,
SUM(order_total) AS gross_revenue
FROM orders
GROUP BY 1;
Running dbt run --select fct_revenue
creates or updates the fct_revenue
table efficiently.
Because a dbt model is ultimately SQL, you can author and test model queries inside Galaxy’s desktop SQL editor. Galaxy’s AI Copilot autocompletes ref()
calls, infers column metadata, and optimizes queries before you commit them to your dbt repository. After deploying with dbt, you can still query the materialized objects from Galaxy and share validated examples with teammates via Galaxy Collections.
dbt build --select model_name --full-refresh
This forces a clean rebuild if schema drift or incremental logic causes errors.
dbt logs compiled SQL in target/compiled/
. Examine it to confirm Jinja blocks render as expected.
Test failures pinpoint rows violating expectations, saving time compared to manual debugging.
dbt models bring software-engineering rigor to data transformation workflows. By treating SQL as code—complete with modular design, dependency management, testing, and documentation—teams gain confidence in the accuracy and maintainability of their analytics pipelines. Whether edited in a code IDE or a modern SQL tool like Galaxy, mastering dbt models is a foundational skill for any analytics engineer.
dbt models encapsulate transformation logic as version-controlled code, enabling teams to apply software engineering best practices like modularity, testing, and CI/CD to analytics. They form the backbone of reliable, scalable data pipelines.
dbt builds a directed acyclic graph (DAG) based on ref()
dependencies. It then executes models in topological order so upstream models finish before dependents.
Yes. Galaxy connects directly to your warehouse, letting you prototype model SQL with auto-completion and lineage context. Once satisfied, paste the query into a dbt model file or sync via Git.
A view is a persisted database object whose logic executes on each query, whereas an ephemeral model compiles into the SQL of downstream models and never exists as its own object in the warehouse.
Use incremental models for large fact tables where only new or changed rows need processing—commonly event streams or append-only transactional data.