Automating the execution of dbt projects on a defined cadence in a production environment using tools such as dbt Cloud, cron, Airflow, Prefect, Dagster, or CI/CD pipelines.
dbt (data build tool) lets analytics engineers transform data in their warehouse with modular SQL and tested, version-controlled code. Yet writing models is only half the story—those models must run reliably on a schedule so fresh data is always available to downstream dashboards and applications. Production scheduling turns a local dbt project into an always-on data service.
dbt Cloud, the SaaS platform from dbt Labs, includes a UI to create Jobs, set cron expressions, specify environments, and trigger notifications. It handles concurrency, logging, retries, and artifact storage out of the box.
For teams that prefer minimal dependencies, a cron
entry on a virtual machine (or Kubernetes CronJob) can call dbt run
. Combine with dbt test
and alerting scripts. Cron is simple but offers no native DAG-level visibility or retry logic.
Airflow’s DAG paradigm maps well to dbt: each model, test, or macro can be an operator, or you can treat the entire run as a single Bash/DbtRunOperator task. Airflow provides rich scheduling, dependencies, retries, SLAs, and UI monitoring.
Prefect’s Orion engine (v2) emphasizes Pythonic flows. The prefect-dbt
collection offers tasks like DbtCoreOperation
for CLI projects and DbtCloudJobRun
for dbt Cloud. Prefect Cloud adds auto-scaling and observability.
Dagster treats data assets as first-class citizens. The software-defined assets
API can map directly to dbt models, generating lineage and materialization metadata that syncs with dbt’s own manifest.
Some teams trigger dbt from GitHub Actions, GitLab CI, or CircleCI. A push to main
can spin up a runner, install dependencies, and execute dbt run
. This approach leverages existing DevOps investment but may lack stateful scheduling features.
Add a connection named my_dbt_conn
that points to your data warehouse. Store secrets like DBT_PROFILES_DIR
in Airflow Variables or Vault.
pip install apache-airflow[google] # for BigQuery, adjust as needed
pip install dbt-core dbt-bigquery # match your adapter
pip install airflow-dbt # community dbt operators
from datetime import datetime
from airflow import DAG
from airflow_dbt.operators.dbt_run import DbtRunOperator
from airflow_dbt.operators.dbt_test import DbtTestOperator
default_args = {
"owner": "data-eng",
"retries": 2,
"retry_delay": 300, # seconds
}
dag = DAG(
dag_id="dbt_daily_prod",
schedule_interval="0 2 * * *", # 2 AM daily
start_date=datetime(2023, 1, 1),
catchup=False,
default_args=default_args,
tags=["dbt", "production"],
)
run = DbtRunOperator(
task_id="dbt_run",
dir="/usr/local/airflow/dags/dbt_project",
profiles_dir="/usr/local/airflow/dags/dbt_project/profiles",
target="prod",
dag=dag,
)
test = DbtTestOperator(
task_id="dbt_test",
dir="/usr/local/airflow/dags/dbt_project",
profiles_dir="/usr/local/airflow/dags/dbt_project/profiles",
target="prod",
dag=dag,
)
run >> test
Configure email or Slack notifications on task_failure events. Use Airflow’s SLA feature to alert when runs exceed expected durations.
dbt build --select state:modified+
to run only changed models during daytime microschedules.dev
, staging
, and prod
targets in profiles.yml
. Only production schedules should hit the prod
warehouse.dbt test
after dbt run
; fail fast on broken assumptions.Galaxy is a modern SQL editor with AI assistance. While Galaxy itself does not execute scheduled jobs, it is an ideal place to author and review the SQL models that power your dbt project. Its Copilot can refactor CTEs, suggest optimizations, and auto-document columns before you commit code that will later be orchestrated by dbt Cloud, Airflow, or other schedulers.
Scheduling turns dbt from a local development tool into a production-grade transformation layer. Whether you choose dbt Cloud for an all-in-one SaaS, Airflow for enterprise-grade orchestration, or cron for simplicity, follow best practices around testing, retries, secrets management, and monitoring. Your data consumers will thank you.
Without automated scheduling, dbt models have to be run manually, leading to stale data, unreliable analytics, and frustrated stakeholders. Proper orchestration brings predictability, observability, and cost control to your data transformations, allowing analytics engineering teams to focus on building models instead of babysitting them.
If you want a fully managed option, dbt Cloud’s Job Scheduler is the quickest path. It provides a UI, logs, retries, and notifications without deploying additional infrastructure.
Yes. A simple cron
entry like 0 3 * * * dbt run --target prod
can work, but you must manage logging, retries, and alerts yourself.
Use DbtRunOperator
with the --select
flag or generate per-model tasks from the manifest to parallelize runs and reduce cost.
Absolutely. Galaxy’s AI Copilot helps you write, refactor, and document dbt models. Once committed to your repo, those models can be scheduled by dbt Cloud, Airflow, or other orchestrators.