dbt is an open-source framework that lets data teams transform raw warehouse tables into tested, documented, production-ready datasets using version-controlled SQL.
dbt (data build tool) compiles version-controlled SQL into executable warehouse code, letting analysts create modular, testable data transformations.
dbt parses .sql
model files, resolves Jinja macros, builds dependency graphs, and issues CREATE TABLE/VIEW AS
statements in your warehouse.
Yes.dbt merely orchestrates SQL; the heavy lifting occurs in Snowflake, BigQuery, Redshift, or Postgres, ensuring scalability and governance.
dbt provides lineage, tests, documentation, CI/CD, and reusable macros, reducing errors and on-call fatigue compared with ad-hoc SQL scripts.
Run pip install dbt-core dbt-bigquery
or choose your adapter.Then execute dbt --version
to confirm a clean install.
Initialize a project with dbt init my_shop
. Configure profiles.yml
to point at your warehouse.
Add models/orders_plus.sql
:
{% set min_order = 10 %}
select *
from {{ ref('raw_orders') }}
where total > {{ min_order }}
Execute dbt run --select orders_plus
; dbt materializes the view and stores lineage metadata.
Define tests in YAML.Example:
version: 2
models:
- name: orders_plus
tests:
- not_null: id
- relationships:
to: ref('customers')
field: customer_id
Write column descriptions in YAML and launch dbt docs generate && dbt docs serve
for a searchable website.
Use git branches, continuous integration, slim CI, semantic naming, and scheduled production runs to keep pipelines healthy.
Galaxy’s desktop SQL editor autocompletes dbt models, lets you test queries against staging data, and shares endorsed transformations with your team.
.
Data teams must deliver trustworthy, maintainable datasets. dbt enforces version control, automated testing, and documentation, turning fragile SQL scripts into governed, production-grade pipelines. Adopting dbt accelerates feature delivery, simplifies onboarding, and integrates smoothly with modern warehouses, making it a cornerstone of analytics engineering.
No. Data engineers and analytics engineers use dbt to maintain ELT pipelines with software-engineering rigor.
Yes. You can run dbt from Docker, Airflow, or any CI/CD runner with the open-source CLI.
Galaxy reads your target
schema, autocompletes ref()
models, and lets teams endorse tested queries before committing them to the dbt repo.
Yes. Add {{ config(materialized='incremental') }}
and define an is_incremental()
filter to process only new records.