Documenting dbt models means systematically adding human-readable descriptions, annotations, and metadata to every model so that anyone can understand its purpose, logic, and downstream impact.
Best Practices for Documenting dbt Models
High-quality documentation turns a collection of SQL transformations into a reliable, discoverable analytics framework. Learn strategies, tooling tips, and common pitfalls to make dbt documentation a core asset—not an afterthought.
In dbt, documentation refers to the YAML files, model-level descriptions, column-level descriptions, and code comments that are compiled into an interactive website by running dbt docs generate && dbt docs serve
. The docs site combines parsed SQL, lineage graphs, and the Markdown-rendered YAML descriptions so analysts and engineers can explore the project without opening an IDE.
An English sentence or two summarizing what the table represents and why it exists.
Short business definitions and data types for every column. Should include units, allowed values, and calculation logic for derived fields.
Explain how raw sources arrive (e.g., Fivetran sync, S3 drop) and what each dbt test enforces.
Inline SQL comments (--
) that clarify non-obvious join conditions, filters, or performance tweaks.
Optional snippets that demonstrate the table in SELECT statements—extremely helpful for self-serve teams.
Treat YAML docs with the same rigor as SQL: code review, linting, CI enforcement, and version control.
Document intent while context is fresh. A pull request should be blocked if description:
fields are empty.
Standardize language and structure. A proven pattern:
description: |
{{doc("model_name")}}
**Business purpose:** <text>
**Key logic:** <text>
**Upstream sources:** <tables>
**Downstream consumers:** <dashboards / apps>
docs
Blocks for Rich MarkdownWrap longform content in {{doc("some_id")}}
and place Markdown files in docs/
. Keeps YAML concise.
dbt-sugar
, Elementary
, or your warehouse’s INFORMATION_SCHEMA.dbt-docs-coverage
package fails builds when coverage drops below target.Where possible, hyperlink Jira tickets, Metrics Layer definitions, or Miro diagrams to give non-technical readers the full story.
Run dbt docs serve
on a schedule and host behind company SSO. Some teams export lineage JSON to tools like Amundsen or DataHub for centralized discovery.
Every quarter, allocate time to prune deprecated columns, update examples, and verify that stated tests still exist.
A well-named model (fct_orders_daily
) plus tags (marts
, finance
) reduce the need for verbose docs and improve lineage filtering.
-- models/fct_orders_daily.sql
{{ config(materialized = 'table') }}
with orders as (
select * from {{ ref('stg_shopify__orders') }}
),
customers as (
select * from {{ ref('dim_customers') }}
)
select
o.order_id,
o.order_date,
c.customer_id,
c.segment,
o.total_amount_usd,
-- applied fx conversion; see column docs
o.order_status
from orders o
left join customers c using(customer_id)
# models/fct_orders_daily.yml
version: 2
models:
- name: fct_orders_daily
description: |
Daily snapshot of Shopify orders including customer segment and FX-converted revenue.
Used by finance and ops KPIs. Upstream data from stg_shopify__orders.
columns:
- name: order_id
description: Primary key from Shopify orders.
- name: order_date
description: Date the order was placed (UTC).
- name: customer_id
description: Foreign key to dim_customers.
- name: segment
description: Customer segmentation label computed in dim_customers.
- name: total_amount_usd
description: Revenue converted to USD using daily_rates FX table.
- name: order_status
description: Current status (open, fulfilled, refunded, cancelled).
Why it’s wrong: Mid-layer staging models often contain crucial cleansing logic.
Fix: Enforce docs for every model in CI, or at least all anything tagged stg
and int
.
Why it’s wrong: Reviewers skip them when deadlines loom.
Fix: Add a PR template checkbox: “All new/modified models have descriptions & examples.”
Why it’s wrong: “Revenue” means different things to Finance vs. Sales.
Fix: Include units, currency, and calculation method in every numeric column’s description.
Because Galaxy is a modern SQL editor aware of your warehouse catalog, it can surface dbt model descriptions inline as you query. When you hover on fct_orders_daily.total_amount_usd
, Galaxy shows the YAML definition and even suggests JOIN clauses based on lineage—leveraging the documentation you already wrote.
dbt ls --resource-type model --output name
.Without clear documentation, a dbt project quickly devolves into opaque SQL spaghetti. Consistent, enforced docs transform the project into a self-service analytics layer where any stakeholder can trace metrics, validate assumptions, and confidently reuse data. This accelerates onboarding, reduces duplicated work, and minimizes costly misinterpretations of business logic.
Use the dbt-docs-coverage
package or a custom script that parses the manifest.json and fails the build when coverage falls below a threshold.
Yes. Tools like dbt-sugar
, Elementary
, or warehouse metadata queries can pre-populate YAML, which you then refine manually.
Galaxy’s context-aware SQL editor surfaces your dbt model and column descriptions inline, offers AI-generated definitions for new fields, and validates joins using dbt lineage—all while you write queries.
Absolutely. Source and test descriptions clarify data quality guarantees, and exposures map models to downstream dashboards or apps, completing the lineage picture.