Data mesh is a decentralized data architecture and organizational approach that treats data as a product, owned and served by cross-functional domain teams through standardized interfaces.
A data mesh is more than an architectural pattern—it is an operating model that decentralizes data ownership to the teams that know the data best, while providing a common set of self-service platforms, governance standards, and interoperability protocols so that the data can be trusted and reused across the organization. Coined by Zhamak Dehghani in 2019, data mesh challenges the traditional, centralized data lake or warehouse by distributing responsibility, thereby aiming to reduce bottlenecks, improve data quality, and accelerate analytics.
For years, enterprises funneled all operational data into a single monolithic platform—first EDWs, then Hadoop clusters, and lately cloud data lakes or lakehouses. While the unified store simplified access in early stages, growth often exposes cracks:
Data mesh proposes a paradigm shift to solve these challenges.
Data is produced—and therefore understood—by the teams that build products or services. A data mesh hands ownership of data products to these domain teams. Ownership means accountability for quality, documentation, lineage, SLAs, and stakeholder support.
Raw log dumps and column cryptograms no longer pass muster. Data should be discoverable, addressable, trustworthy, self-describing, secure, and interoperable (the DATSIS attributes). Each domain team treats its dataset like a customer-facing API, versioning changes and providing service guarantees.
Decentralization must not result in a Wild West of bespoke pipelines. A dedicated platform team offers a paved road—CI/CD templates, cataloging, lineage capture, observability, and governance guardrails—so domain teams can focus on business logic rather than infrastructure wrangling.
Compliance, security, and interoperability rules are encoded into the platform through policy-as-code, automated quality gates, and shared vocabularies. A federated governance council with representatives from each domain evolves standards without re-centralizing work.
A typical cloud-native data mesh stacks multiple layers:
Consider an e-commerce company with the following domains: Orders, Catalog, and Payments. Under a data mesh:
orders
data product, materialized daily into an Iceberg table partitioned by order_date
. They publish a semantic model with metrics like total_order_value
.products
and inventory_levels
datasets.transactions
and exposes an event stream for real-time fraud detection.GMV
by querying orders
and transactions
without filing tickets.Data fabric is largely technology-driven, emphasizing integration tooling and smart middleware. Data mesh is primarily organizational and cultural, though it leans on modern tech. In practice, organizations often blend both: a shared data fabric platform enabling a mesh operating model.
The snippet below shows how the Orders team might publish its data product using dbt within a CI pipeline:
# .github/workflows/orders_data_product.yml
name: Build Orders Data Product
on:
push:
paths:
- models/orders/**
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install dependencies
run: |
pip install dbt-bigquery==1.7.0 dbt-expectations great_expectations
- name: Run dbt tests
run: |
dbt test --select orders*
- name: Build models
run: |
dbt run --select orders*
- name: Publish lineage to OpenMetadata
run: |
curl -X POST "$OM_API/lineage" -d @target/manifest.json
While Galaxy is not a data mesh platform in itself, its fast, collaborative SQL editor makes it easier for distributed domain teams to explore, validate, and share their data products. With Galaxy Collections, a domain team can endorse canonical queries—for example, total_order_value
—making them discoverable across the mesh. The context-aware AI copilot helps maintain query accuracy even as underlying schemas evolve, reducing friction between domains.
Data mesh shifts the center of gravity from a monolithic data team to empowered domain squads, supported by robust self-service platforms and federated governance. When done right, it unlocks scalable, high-quality data products and fosters a culture of shared responsibility and rapid insight.
As organizations scale, centralized data teams become bottlenecks, leading to slow analytics and poor data quality. Data mesh offers a scalable alternative by distributing ownership to domain experts while enforcing governance through self-service platforms. Understanding data mesh helps data engineers design architectures that empower teams, enhance trust, and accelerate decision-making.
It addresses the bottlenecks and quality issues stemming from centralized data teams by decentralizing ownership, thereby improving agility and domain relevance.
No. A lakehouse is an architectural pattern for unified storage and processing, whereas data mesh is an organizational approach that can use a lakehouse as one of its technologies.
Yes. Galaxy’s collaborative SQL editor allows distributed domain teams to build, endorse, and share queries tied to their data products, embodying the data-as-a-product mindset.
You need mature engineering practices—CI/CD for data, observability, data cataloging, and a culture open to decentralized ownership.