An incremental model in dbt is a table-building strategy that appends or updates only the new or changed data since the last run, instead of rebuilding the entire dataset.
Incremental Models in dbt: Build Faster Pipelines by Processing Only New Data
Learn what an incremental model is, how it works in dbt, and how to implement it correctly—complete with best practices, common pitfalls, and practical SQL examples you can run today.
An incremental model in dbt (data build tool) is a type of model that—after its first full run—executes subsequent runs by processing only the new or changed data and merging it into the existing table. Instead of truncating and re-creating the table, dbt issues INSERT
, MERGE
, or UPDATE
statements, depending on your adapter and configuration. This dramatically reduces runtime and compute costs for large datasets while maintaining data freshness.
Modern analytics stacks often work with ever-growing event streams, logs, or transactional data. Rebuilding a large fact table every hour quickly becomes impractical. Incremental models solve three key challenges:
You declare an incremental model in the model’s .sql
file using the materialized="incremental"
config:
{{
config(
materialized = "incremental",
unique_key = "order_id", -- optional but recommended
incremental_strategy = "merge" -- default varies by adapter
)
}}
On the first execution, dbt creates the table by running the entire select statement and writing the full result set to your warehouse.
is_incremental()
Inside your model, you can wrap logic with the is_incremental()
Jinja function to run code only during incremental runs. Example:
SELECT *
FROM {{ source('raw', 'orders') }} o
{% if is_incremental() %}
WHERE o.updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
During a full refresh, the WHERE
clause is skipped; on incremental runs, dbt fetches only rows with a higher updated_at
than what’s already in the target table.
dbt hands the result set to the adapter-specific incremental strategy:
MERGE
/INSERT ... ON CONFLICT
to upsert based on unique_key
.MERGE
isn’t supported).updated_at
or an auto-incrementing id.models/fct_orders.sql
and add the config block shown above.is_incremental()
: Filter the source data to only the new slice.unique_key
: Declare the column(s) that uniquely identify each record. This enables de-duplication on upserts.dbt run -s fct_orders
. The first run builds the full table.Use a strictly increasing timestamp or sequence to avoid missing late-arriving data. If your source sends corrections, combine both updated_at
filtering and a MERGE
strategy with unique_key
.
Retrieving MAX(updated_at)
can strain large tables. Instead, store your last run timestamp in a variable or use partitioned tables and query only the latest partition.
Run dbt run --full-refresh -s fct_orders
periodically (e.g., weekly) to ensure that your incremental logic still produces deterministic results.
Add descriptions, columns, and config to your YAML files so teammates understand the incremental assumptions. Galaxy’s rich metadata panel will surface these details directly while you edit SQL.
{{
config(
materialized = "incremental",
unique_key = "order_id",
incremental_strategy = "merge"
)
}}
WITH src AS (
SELECT *
FROM {{ source('raw', 'orders') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT COALESCE(MAX(updated_at), '1900-01-01') FROM {{ this }})
{% endif %}
)
SELECT
order_id,
customer_id,
status,
total_amount,
updated_at,
CURRENT_TIMESTAMP AS dbt_processed_at
FROM src;
Although dbt orchestrates the transformation, you still need to write performant SQL. Galaxy’s desktop SQL editor auto-completes source table names, validates Jinja syntax, and offers an AI copilot that can suggest the is_incremental()
filter clause for you. Once committed, Galaxy’s collaboration features let teammates review and endorse your incremental logic before it hits production.
unique_key
often leads to duplicates. Always set it unless your data truly is append-only and immutable.--full-refresh
or snapshotting is still required for slowly changing dimensions and auditing.Experiment locally: point dbt at a development warehouse, build a small incremental model, and benchmark the runtime before and after. Then layer in tests and documentation to productionize your pipeline.
As data volumes grow, full-table rebuilds become slow and expensive. Incremental models let analytics teams process only new or changed data, making pipelines faster, cheaper, and more reliable while scaling to billions of rows.
Run dbt run --full-refresh -s my_model
. dbt will drop the existing table and recreate it from scratch.
If the change only affects future rows, an incremental run is fine. If it alters historical calculations, perform a full refresh to backfill.
Galaxy is a SQL editor that streamlines writing and reviewing the SQL used in your dbt incremental models. You’ll still execute the models with dbt, but Galaxy accelerates authoring and collaboration.
Yes. Snapshots capture row-level history, while incremental models maintain the latest state. Use snapshots for slowly changing dimensions and incrementals for facts.