Incremental Models in dbt

What is an incremental model in dbt and how do you implement one?

An incremental model in dbt is a table-building strategy that appends or updates only the new or changed data since the last run, instead of rebuilding the entire dataset.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Incremental Models in dbt: Build Faster Pipelines by Processing Only New Data

Learn what an incremental model is, how it works in dbt, and how to implement it correctly—complete with best practices, common pitfalls, and practical SQL examples you can run today.

What Is an Incremental Model?

An incremental model in dbt (data build tool) is a type of model that—after its first full run—executes subsequent runs by processing only the new or changed data and merging it into the existing table. Instead of truncating and re-creating the table, dbt issues INSERT, MERGE, or UPDATE statements, depending on your adapter and configuration. This dramatically reduces runtime and compute costs for large datasets while maintaining data freshness.

Why Are Incremental Models Important?

Modern analytics stacks often work with ever-growing event streams, logs, or transactional data. Rebuilding a large fact table every hour quickly becomes impractical. Incremental models solve three key challenges:

Performance: Process only the delta, slashing wall-clock time and resource consumption.
Cost Efficiency: Many data warehouses charge by bytes scanned or slots used; incremental loads keep bills in check.
Operational Stability: Short, focused jobs are less likely to hit timeouts or memory limits and are easier to retry.

How Incremental Loading Works in dbt

1. Configuration

You declare an incremental model in the model’s .sql file using the materialized="incremental" config:

{{ config( materialized = "incremental", unique_key = "order_id", -- optional but recommended incremental_strategy = "merge" -- default varies by adapter ) }}

2. First Run: Full Refresh

On the first execution, dbt creates the table by running the entire select statement and writing the full result set to your warehouse.

3. Subsequent Runs: Detecting `is_incremental()`

Inside your model, you can wrap logic with the is_incremental() Jinja function to run code only during incremental runs. Example:

SELECT * FROM {{ source('raw', 'orders') }} o {% if is_incremental() %} WHERE o.updated_at > (SELECT MAX(updated_at) FROM {{ this }}) {% endif %}

During a full refresh, the WHERE clause is skipped; on incremental runs, dbt fetches only rows with a higher updated_at than what’s already in the target table.

4. Writing the Delta

dbt hands the result set to the adapter-specific incremental strategy:

insert_append – naïvely appends rows (fastest but can create duplicates).
merge – uses SQL MERGE/INSERT ... ON CONFLICT to upsert based on unique_key.
delete+insert – deletes matching keys and inserts fresh rows (useful when MERGE isn’t supported).

Step-by-Step Implementation Guide

Identify Change Data: Choose a reliable watermark column such as updated_at or an auto-incrementing id.
Create the Model: Place your logic in models/fct_orders.sql and add the config block shown above.
Use is_incremental(): Filter the source data to only the new slice.
Set unique_key: Declare the column(s) that uniquely identify each record. This enables de-duplication on upserts.
Run dbt: Execute dbt run -s fct_orders. The first run builds the full table.
Schedule Regular Runs: In production, orchestrate dbt hourly or daily via Airflow, Dagster, or GitHub Actions.

Best Practices

Choose an Immutable or Monotonic Watermark

Use a strictly increasing timestamp or sequence to avoid missing late-arriving data. If your source sends corrections, combine both updated_at filtering and a MERGE strategy with unique_key.

Limit Incremental Window Size

Retrieving MAX(updated_at) can strain large tables. Instead, store your last run timestamp in a variable or use partitioned tables and query only the latest partition.

Test with Full-Refresh Flags

Run dbt run --full-refresh -s fct_orders periodically (e.g., weekly) to ensure that your incremental logic still produces deterministic results.

Document with dbt Docs

Add descriptions, columns, and config to your YAML files so teammates understand the incremental assumptions. Galaxy’s rich metadata panel will surface these details directly while you edit SQL.

Practical Example: Incremental Orders Fact Table

{{ config( materialized = "incremental", unique_key = "order_id", incremental_strategy = "merge" ) }} WITH src AS ( SELECT * FROM {{ source('raw', 'orders') }} {% if is_incremental() %} WHERE updated_at > (SELECT COALESCE(MAX(updated_at), '1900-01-01') FROM {{ this }}) {% endif %} ) SELECT order_id, customer_id, status, total_amount, updated_at, CURRENT_TIMESTAMP AS dbt_processed_at FROM src;

How Galaxy Helps

Although dbt orchestrates the transformation, you still need to write performant SQL. Galaxy’s desktop SQL editor auto-completes source table names, validates Jinja syntax, and offers an AI copilot that can suggest the is_incremental() filter clause for you. Once committed, Galaxy’s collaboration features let teammates review and endorse your incremental logic before it hits production.

Common Misconceptions

“Incremental = Upserts Everywhere”: You can mix incremental and full models in the same project. Choose based on data volume and latency needs.
“No Unique Key Needed”: Skipping unique_key often leads to duplicates. Always set it unless your data truly is append-only and immutable.
“Infinite Incremental Versions”: Periodic --full-refresh or snapshotting is still required for slowly changing dimensions and auditing.

Next Steps

Experiment locally: point dbt at a development warehouse, build a small incremental model, and benchmark the runtime before and after. Then layer in tests and documentation to productionize your pipeline.

Why Incremental Models in dbt is important

As data volumes grow, full-table rebuilds become slow and expensive. Incremental models let analytics teams process only new or changed data, making pipelines faster, cheaper, and more reliable while scaling to billions of rows.