This resource demystifies data observability—what it is, why it matters, and how to implement it. You’ll learn the five pillars, explore real-world use cases, walk through a step-by-step implementation plan, and complete a hands-on freshness-monitoring exercise using SQL (and Galaxy).
Data observability is the ability to fully understand the health, reliability, and lineage of data in your pipelines. Just as DevOps teams rely on application observability (logs, metrics, traces) to detect issues before users notice, data teams use observability to catch broken transformations, schema drifts, and stale tables before executives make a bad decision.
In short, data observability answers two questions:
Most observability platforms—open-source or commercial—track at least these five signal categories:
How recently was the data updated? Monitor timestamps to ensure tables meet refresh SLAs (e.g., <15 min latch).
Do row counts or file sizes deviate from historical norms? Sudden spikes or drops can indicate upstream failures.
Have columns been added, removed, or type-changed? Schema drift silently breaks downstream queries.
What dependencies feed a dataset, and which dashboards depend on it? Lineage graphs help trace root causes quickly.
Are null rates, uniqueness constraints, or business logic tests (e.g., revenue >= 0
) within tolerance?
Although vendors differ, most systems share the same building blocks:
Visual description: A layered diagram showing data sources on the left, an observability agent collecting metrics, a metrics store in the middle, a rules engine above it, and alert channels on the right.
List tables, views, materialized views, and external files. Tools like information_schema.tables
in SQL or open-source catalogs (e.g., Amundsen) help. In Galaxy, you can run:
SELECT table_schema, table_name, row_estimate
FROM pg_catalog.pg_stat_user_tables;
Choose Service Level Indicators (SLIs) such as freshness < 15 min and null_rate < 1%. Publish Service Level Agreements so stakeholders know expectations.
Because Galaxy stores and versions queries, you can save metric-collection SQL in a shared Observability Collection:
-- row_count_daily_orders.sql
INSERT INTO observability.row_counts
SELECT CURRENT_DATE AS as_of,
'core.orders' AS table_name,
COUNT(*) AS row_count
FROM core.orders;
Set this query to run via your scheduler (Airflow, Dagster) and Galaxy will track query history and performance.
Simple thresholds work initially:
SELECT *
FROM observability.row_counts rc
JOIN observability.row_counts rc_prev
ON rc_prev.table_name = rc.table_name
AND rc_prev.as_of = rc.as_of - INTERVAL '1 day'
WHERE ABS(rc.row_count - rc_prev.row_count) > 0.3 * rc_prev.row_count;
Flag results in Galaxy and send alerts to Slack via webhook integrations.
When an alert fires, check Galaxy’s Run History to compare query plans and recent schema changes. Because Galaxy links directly to GitHub, you can trace whether a recent dbt PR altered the affected model.
Follow these instructions in your own workspace (or a demo Postgres DB):
observability
to store metrics.-- Create table to store freshness snapshots
CREATE TABLE IF NOT EXISTS observability.freshness (
table_name TEXT PRIMARY KEY,
last_loaded TIMESTAMP NOT NULL,
captured_at TIMESTAMP DEFAULT NOW()
);
-- Capture freshness for a target table
INSERT INTO observability.freshness (table_name, last_loaded)
SELECT 'core.users', MAX(updated_at)
FROM core.users;
NOW() - last_loaded > INTERVAL '30 minutes'
.#data-alerts
.🎉 Congrats—your first observability check is live! Extend it by adding row-count monitoring, schema diffs (using information_schema.columns
), or dbt tests.
E-commerce: A retailer uses freshness alerts to ensure product catalog data (core.products
) syncs every 5 minutes. Sales operations rely on up-to-date inventory when generating flash-sale pricing.
SaaS Startup: Analytics engineers monitor schema drift on user-event tables. Galaxy’s AI copilot automatically refactors queries when new columns appear, reducing incident response from hours to minutes.
Financial Services: Compliance teams audit every change to PII tables. Galaxy’s permissioning ensures only approved users can edit observability checks, satisfying SOC 2 controls.
great_expectations
or dbt-tests
and trigger them from Galaxy.