Best Practices for Documenting dbt Models

Galaxy Glossary

What are the best practices for documenting dbt models?

Documenting dbt models means systematically adding human-readable descriptions, annotations, and metadata to every model so that anyone can understand its purpose, logic, and downstream impact.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Best Practices for Documenting dbt Models

High-quality documentation turns a collection of SQL transformations into a reliable, discoverable analytics framework. Learn strategies, tooling tips, and common pitfalls to make dbt documentation a core asset—not an afterthought.

Quick Refresher: What Is dbt Documentation?

In dbt, documentation refers to the YAML files, model-level descriptions, column-level descriptions, and code comments that are compiled into an interactive website by running dbt docs generate && dbt docs serve. The docs site combines parsed SQL, lineage graphs, and the Markdown-rendered YAML descriptions so analysts and engineers can explore the project without opening an IDE.

Why Meticulous Model Docs Matter

  • Knowledge sharing & onboarding: New teammates ramp faster when every model explains its purpose, business logic, and assumptions.
  • Trust & governance: Stakeholders can trace KPIs to raw sources, reducing ad-hoc requests and audit anxiety.
  • Debugging & refactoring: Good docs flag edge cases and deprecated fields before someone mis-uses them.
  • AI & tooling leverage: Context-aware SQL editors like Galaxy or dbt’s own AI helpers use model metadata to improve autocomplete and code generation.

Core Components of dbt Model Documentation

1. Model-Level Description

An English sentence or two summarizing what the table represents and why it exists.

2. Column-Level Descriptions

Short business definitions and data types for every column. Should include units, allowed values, and calculation logic for derived fields.

3. Source & Test Descriptions

Explain how raw sources arrive (e.g., Fivetran sync, S3 drop) and what each dbt test enforces.

4. Code Comments

Inline SQL comments (--) that clarify non-obvious join conditions, filters, or performance tweaks.

5. Usage Examples

Optional snippets that demonstrate the table in SELECT statements—extremely helpful for self-serve teams.

Best Practices for Writing dbt Documentation

Adopt a "docs-as-code" Culture

Treat YAML docs with the same rigor as SQL: code review, linting, CI enforcement, and version control.

Write Descriptions Before or During Development

Document intent while context is fresh. A pull request should be blocked if description: fields are empty.

Follow a Consistent Template

Standardize language and structure. A proven pattern:

description: |
{{doc("model_name")}}
**Business purpose:** <text>
**Key logic:** <text>
**Upstream sources:** <tables>
**Downstream consumers:** <dashboards / apps>

Use dbt’s docs Blocks for Rich Markdown

Wrap longform content in {{doc("some_id")}} and place Markdown files in docs/. Keeps YAML concise.

Automate What You Can

  • Autogenerate column descriptions for raw sources via dbt-sugar, Elementary, or your warehouse’s INFORMATION_SCHEMA.
  • Leverage AI tools: Galaxy’s Copilot can pull definitions from existing tables or business glossaries as you write YAML.
  • Set CI checks: dbt’s dbt-docs-coverage package fails builds when coverage drops below target.

Link Out to Business Context

Where possible, hyperlink Jira tickets, Metrics Layer definitions, or Miro diagrams to give non-technical readers the full story.

Keep Docs Close to the Code But Visible to Everyone

Run dbt docs serve on a schedule and host behind company SSO. Some teams export lineage JSON to tools like Amundsen or DataHub for centralized discovery.

Schedule Documentation Debt Sprints

Every quarter, allocate time to prune deprecated columns, update examples, and verify that stated tests still exist.

Use Descriptive Model Names & Tags

A well-named model (fct_orders_daily) plus tags (marts, finance) reduce the need for verbose docs and improve lineage filtering.

End-to-End Example

-- models/fct_orders_daily.sql
{{ config(materialized = 'table') }}

with orders as (
select * from {{ ref('stg_shopify__orders') }}
),

customers as (
select * from {{ ref('dim_customers') }}
)

select
o.order_id,
o.order_date,
c.customer_id,
c.segment,
o.total_amount_usd,
-- applied fx conversion; see column docs
o.order_status
from orders o
left join customers c using(customer_id)
# models/fct_orders_daily.yml
version: 2
models:
- name: fct_orders_daily
description: |
Daily snapshot of Shopify orders including customer segment and FX-converted revenue.
Used by finance and ops KPIs. Upstream data from stg_shopify__orders.
columns:
- name: order_id
description: Primary key from Shopify orders.
- name: order_date
description: Date the order was placed (UTC).
- name: customer_id
description: Foreign key to dim_customers.
- name: segment
description: Customer segmentation label computed in dim_customers.
- name: total_amount_usd
description: Revenue converted to USD using daily_rates FX table.
- name: order_status
description: Current status (open, fulfilled, refunded, cancelled).

Common Pitfalls & How to Avoid Them

1. Documenting Only Final Models

Why it’s wrong: Mid-layer staging models often contain crucial cleansing logic.
Fix: Enforce docs for every model in CI, or at least all anything tagged stg and int.

2. Treating Docs as Secondary Pull-Request Items

Why it’s wrong: Reviewers skip them when deadlines loom.
Fix: Add a PR template checkbox: “All new/modified models have descriptions & examples.”

3. Writing Vague Column Definitions

Why it’s wrong: “Revenue” means different things to Finance vs. Sales.
Fix: Include units, currency, and calculation method in every numeric column’s description.

How Galaxy Helps

Because Galaxy is a modern SQL editor aware of your warehouse catalog, it can surface dbt model descriptions inline as you query. When you hover on fct_orders_daily.total_amount_usd, Galaxy shows the YAML definition and even suggests JOIN clauses based on lineage—leveraging the documentation you already wrote.

Next Steps

  1. Audit current dbt docs coverage with dbt ls --resource-type model --output name.
  2. Establish a style guide and CI rule for new PRs.
  3. Use Galaxy or dbt Cloud to auto-preview docs while coding.

Why Best Practices for Documenting dbt Models is important

Without clear documentation, a dbt project quickly devolves into opaque SQL spaghetti. Consistent, enforced docs transform the project into a self-service analytics layer where any stakeholder can trace metrics, validate assumptions, and confidently reuse data. This accelerates onboarding, reduces duplicated work, and minimizes costly misinterpretations of business logic.

Best Practices for Documenting dbt Models Example Usage


-- Find high-value refunds
select order_id, total_amount_usd
from 
where order_status = 'refunded'
  and total_amount_usd &gt; 500

Best Practices for Documenting dbt Models Syntax



Common Mistakes

Frequently Asked Questions (FAQs)

How do I enforce documentation coverage in CI?

Use the dbt-docs-coverage package or a custom script that parses the manifest.json and fails the build when coverage falls below a threshold.

Can I autogenerate column descriptions in dbt?

Yes. Tools like dbt-sugar, Elementary, or warehouse metadata queries can pre-populate YAML, which you then refine manually.

How does Galaxy improve the dbt documentation workflow?

Galaxy’s context-aware SQL editor surfaces your dbt model and column descriptions inline, offers AI-generated definitions for new fields, and validates joins using dbt lineage—all while you write queries.

Should I document tests and exposures too?

Absolutely. Source and test descriptions clarify data quality guarantees, and exposures map models to downstream dashboards or apps, completing the lineage picture.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie
BauHealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.