Using Grafana dashboards to visualize, alert on, and operationalize the results of dbt test runs.
dbt’s built-in tests generate artifacts that describe the health of every model in your warehouse. By piping those artifacts into Grafana you can create always-on observability: dashboards, alerts, and SLAs that keep bad data from ever reaching production consumers.
dbt focuses on transforming data and validating assumptions at build time; Grafana excels at real-time visualization and alerting. Together they provide a full feedback loop for data quality:
The most common pattern looks like this:
--write-json
so run_results.json
and manifest.json
are produced.state
operations) that extracts test status, model name, and timestamps.If you already run your transformation jobs in Kubernetes or Airflow you can attach a Prometheus side-car and skip step 3.
Add the following flags to every dbt invocation:
dbt run --write-json
# or for tests
dbt test --write-json
The files will land in ./target/
. To keep history, copy them to durable storage (S3, GCS) with a job ID in the path.
The minimal information you need for Grafana:
test_name
status
(pass/fail/error/skip)execution_time
(seconds)invocation_id
or run_started_at
Python example that converts run_results.json
into Prometheus gauge metrics:
import json, time, prometheus_client
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
registry = CollectorRegistry()
status_gauge = Gauge(
"dbt_test_status",
"dbt test status (1=pass,0=fail)",
["test_name"],
registry=registry,
)
exec_gauge = Gauge(
"dbt_test_execution_seconds",
"Runtime of each dbt test run",
["test_name"],
registry=registry,
)
with open("target/run_results.json") as f:
results = json.load(f)["results"]
for r in results:
metric_value = 1 if r["status"] == "pass" else 0
status_gauge.labels(test_name=r["unique_id"]).set(metric_value)
exec_gauge.labels(test_name=r["unique_id"]).set(r["execution_time"])
push_to_gateway("https://prometheus-push:9091", job="dbt_tests", registry=registry)
Once Prometheus scrapes or receives the pushed metrics:
sum by (test_name) (dbt_test_status == 0)
to show currently failing tests.rate(dbt_test_status[1d])
to track flakiness.$environment
, $project
) to slice by workspace.When an alert fires:
dbt test --select <failing_test>
locally or in Galaxy’s SQL editor to debug.1
.Emitting only generic schema tests (not_null
, unique
) misses upstream freshness issues. Include source_freshness
run in the same job and send that to Grafana as well.
Prometheus recommends lowercase, snake_case, and units. E.g. dbt_test_execution_seconds
, not DbtTestExecTimeMs
.
Add labels for env
, team
, and model_name
so dashboards remain decipherable as projects grow.
Export dashboards as JSON and commit to Git. Use Grafana’s Terraform provider or Grafana Loki for GitOps-style deployments.
status
ValuesProblem: Treating error
the same as fail
inflates incident count.
Fix: Map error
to -1
, fail
to 0
, pass
to 1
, and alert only on 0
.
Problem: Pushing thousands of metrics every minute leads to high cardinality and OOM errors.
Fix: Push once per job, delete old time-series with metric_relabel_configs
, or switch to a remote_write
-compatible backend.
Problem: Only current status is displayed; patterns are missed.
Fix: Store artifacts in S3 and backfill metrics with a nightly batch job. Visualize 30-day MTTR (mean time to resolution).
Galaxy’s desktop SQL editor is the fastest way to reproduce a failing dbt test:
Because Galaxy keeps a history of every query, you can track what changed between the failing and passing versions.
By exporting dbt test artifacts to Prometheus and visualizing them in Grafana, teams gain real-time insight into data health, shorten incident resolution time, and build trust in their analytics layer. The approach is lightweight, open-source friendly, and scales from a single cron job to thousands of daily runs. With tools like Galaxy accelerating SQL diagnostics, the feedback loop becomes even tighter, ensuring that data engineers catch quality issues long before they reach stakeholders.
dbt tests guard against bad data, but without real-time monitoring failures can go unnoticed for hours, breaking dashboards and ML models. Integrating dbt with Grafana closes this observability gap, providing instant visibility and automated alerts that maintain trust in the data platform.
Yes. Grafana Cloud includes hosted Prometheus and Alertmanager. Point your Pushgateway to the remote_write endpoint and dashboards work the same.
Most teams run a full test suite hourly or after every warehouse load. Critical source freshness tests can run every 5 minutes.
Add a PythonOperator
after the DbtRunOperator
to parse artifacts and push metrics. Airflow’s StatsD service integrates seamlessly with Prometheus.
Galaxy’s SQL editor lets you reproduce the failing query instantly with AI-powered suggestions, so you can debug and ship a fix before the next job runs.