dbt Ephemeral Models

Galaxy Glossary

What is a dbt ephemeral model and how do you implement it?

An ephemeral model in dbt is a transient SQL transformation that never materializes as a table or view; instead, its compiled SQL is inlined into downstream models at run-time, resulting in zero footprint in the warehouse.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

An ephemeral model in dbt is a lightweight transformation that exists only during compilation. Rather than persisting its results as a table or view, dbt injects ("inlines") the model’s SQL into every downstream model that references it. This keeps your warehouse lean, speeds up DAG execution, and eliminates extra permission work—at the cost of re-computing the subquery each time it’s used.

What Is an Ephemeral Model in dbt?

dbt (data build tool) supports four core materializations—table, view, incremental, and ephemeral. While the first three create persistent objects in your data warehouse, the ephemeral materialization is non-persistent. The compiled SQL from an ephemeral model is inserted as a CTE or subquery into every model that calls ref() on it. As a result, no physical object is created, and the warehouse stores zero bytes for that transformation.

Why Use Ephemeral Models?

  • Avoid Clutter: Large dbt projects can generate hundreds of tables/views. Ephemeral models keep intermediate logic out of your schema.
  • Speed Up DAG Builds: Eliminating DDL/DML steps for intermediate artifacts reduces run-time, especially in warehouses where object creation is slow.
  • Enforce Modularity: You can break SQL into small, testable units without worrying about permissioning or naming collisions for each layer.
  • Lower Costs: Some warehouses charge for storage or table metadata. Ephemeral models generate none.

How dbt Builds Ephemeral Models Under the Hood

During dbt run or dbt compile:

  1. dbt parses your model DAG and identifies nodes with materialized: ephemeral.
  2. For each downstream model that references the node via ref('my_ephemeral_model'), dbt replaces that reference with the full SQL of the ephemeral model, wrapped as a Common Table Expression (CTE).
  3. The process repeats recursively, so multi-layer chains of ephemerals are fully expanded.
  4. The final compiled SQL is executed. Since there’s no CREATE TABLE/VIEW, the warehouse sees only the outer model’s DDL.

The result is a single warehouse query per non-ephemeral model, no matter how many ephemerals contribute inside.

When to Use (and When to Avoid) Ephemeral Models

Ideal Scenarios

  • Lightweight transformations (simple joins, filters, type casts).
  • Logic needed by a small number of downstream models.
  • Warehouses that charge per table, e.g., Snowflake’s meta-data storage.

Not Recommended

  • CPU-heavy transformations reused across many models. Re-executing them can become expensive.
  • Intermediate data required for debugging or auditing. Because there’s no physical table, you can’t inspect the state mid-flow without recompiling SQL.
  • Use cases requiring data lineage tools that trace physical objects—ephemeral nodes don’t appear in the warehouse.

Step-by-Step Implementation Guide

1. Create the Model File

Add a new SQL file in your models/ directory—for example, models/staging/stg_page_views_ephemeral.sql.

2. Add the Materialization Config

-- models/staging/stg_page_views_ephemeral.sql
{{ config(materialized = 'ephemeral') }}

SELECT
user_id,
page_url,
event_timestamp
FROM {{ ref('raw_page_views') }}
WHERE event_name = 'page_view';

3. Reference the Model

-- models/marts/fct_sessions.sql

WITH page_views AS (
SELECT *
FROM {{ ref('stg_page_views_ephemeral') }}
),
...

4. Run and Test

$ dbt run --select fct_sessions

Inspect the compiled SQL in target/compiled; you’ll see the CTE page_views contains the raw SQL from the ephemeral model.

Best Practices

  • Name Clearly: Many teams prefix with stg_ or suffix _ephemeral to convey non-persistence.
  • Keep It Lightweight: Complex aggregations or window functions are better suited to table or incremental.
  • Document Well: Because the model disappears in the warehouse, strong in-repo documentation (YAML descriptions, schema tests) is crucial.
  • Monitor Performance: Use dbt’s --debug and warehouse query logs to ensure inlined SQL isn’t causing redundant scans.

Common Mistakes and How to Fix Them

  • Mistake 1 – Treating Ephemerals Like Persistent Tables: Developers sometimes attempt to grant permissions or query them directly. Fix by remembering they don’t exist post-compile.
  • Mistake 2 – Heavy Computation: Running complex transformations as ephemeral inflates downstream query time. Convert to materialized: table or incremental.
  • Mistake 3 – Over-reusability: If an ephemeral is referenced by dozens of models, consider persisting it to avoid N× recomputation.

Practical Example: Sessionization

Suppose you need to sessionize web events. You can:

  1. Create stg_page_views_ephemeral that filters raw events.
  2. Build stg_sessionize_ephemeral that applies window functions over those views.
  3. Finally, materialize fct_sessions as a table. The two staging layers never touch the warehouse.

Working Code Example

Below is a minimal, end-to-end example you can copy into a dbt project.

-- models/staging/stg_clicks_ephemeral.sql
{{ config(materialized = 'ephemeral') }}
SELECT *
FROM {{ source('app', 'events') }}
WHERE event_type = 'click';

-- models/staging/stg_purchases_ephemeral.sql
{{ config(materialized = 'ephemeral') }}
SELECT *
FROM {{ source('app', 'events') }}
WHERE event_type = 'purchase';

-- models/marts/fct_user_activity.sql
{{ config(materialized = 'table') }}
WITH clicks AS (
SELECT * FROM {{ ref('stg_clicks_ephemeral') }}
),
purchases AS (
SELECT * FROM {{ ref('stg_purchases_ephemeral') }}
)
SELECT
user_id,
COUNT(clicks.*) AS click_cnt,
COUNT(purchases.*) AS purchase_cnt
FROM clicks
LEFT JOIN purchases USING (user_id)
GROUP BY 1;

You’ll end up with a single persisted table (fct_user_activity) and no intermediate objects.

Developing Ephemeral Models in Galaxy

Because Galaxy is a modern SQL editor with contextual AI, you can author .sql model files for dbt locally, validate them against your warehouse, and leverage Galaxy’s AI copilot to refactor or optimize inlined SQL swiftly. The desktop app’s version control-friendly environment makes it particularly well-suited for iterating on ephemeral transformations that need rapid feedback but don’t require schema changes in the warehouse.

Why dbt Ephemeral Models is important

As data pipelines scale, the explosion of intermediate tables can slow down development, inflate storage costs, and complicate governance. Ephemeral models give data engineers a lever to keep DAGs modular without polluting the warehouse, enabling faster iterations and cheaper, more maintainable analytics layers.

dbt Ephemeral Models Example Usage


"What is the difference between an ephemeral model and a view in dbt?"

Common Mistakes

Frequently Asked Questions (FAQs)

What happens to data generated by an ephemeral model?

No data is ever persisted. The SQL is inlined into downstream queries, so the warehouse only stores results for the outer model(s).

When should I choose an ephemeral model over a view or table?

Use ephemerals for lightweight, rarely reused logic where reducing warehouse clutter outweighs the cost of recomputation. If performance degrades or multiple models reuse the same logic, switch to a view or table.

Do ephemeral models slow down dbt runs?

They generally speed up runs by avoiding table creation, but if the inlined SQL is heavy or reused widely, total query time can rise. Monitor warehouse execution plans to decide.

Can I develop dbt projects, including ephemeral models, inside Galaxy?

Yes. Galaxy’s desktop SQL editor lets you create and edit dbt model files locally, run them against your warehouse, and use the AI copilot to refactor SQL—all without leaving the IDE.

Want to learn about other SQL terms?