Back to Articles

Data Catalog vs Metadata Layer vs Semantic Layer: Where Governance Actually Lives

Feb 23, 2026

Data Governance

Most data teams have invested in at least one governance tool. Many have invested in three. And yet the same questions keep surfacing in Slack: "Where is the canonical definition of churn?" "Which table is the source of truth for revenue?" "Can I trust this metric in the board deck?"

The problem is rarely a lack of tooling. It is a lack of clarity about which layer is responsible for which governance job. Data catalogs, metadata layers, and semantic layers each solve real problems, but their boundaries blur in practice, and the gaps between them are where trust breaks down. This guide maps governance responsibilities to the layer that can actually own them, then shows where a system-of-meaning like Galaxy connects the pieces.

Who this is for

If you lead a data platform team, manage analytics engineering, or are responsible for data governance strategy, you are the audience. The guide assumes familiarity with modern data stacks and focuses on architectural decisions, not product demos.

The short version: definitions in 60 seconds

Data catalog: A discovery and stewardship interface. It helps people find datasets, understand ownership, and browse lineage. Think of it as the library card catalog for your data estate.

Metadata layer: The infrastructure that captures, normalizes, and stores metadata (technical, operational, business) across systems. It feeds the catalog, the semantic layer, and increasingly, AI agents.

Semantic layer: A query abstraction that centralizes metric definitions, join logic, and access rules so every downstream consumer gets consistent answers. dbt, for example, positions its semantic layer as a way to define metrics on top of existing models and automatically handle joins.

Galaxy sits between and across these layers as a living model of the business: an ontology-driven knowledge graph that makes entities, relationships, and meaning explicit. Where a catalog shows you what exists and a semantic layer tells a BI tool how to query, Galaxy captures why things are connected and how the business actually operates.

The core problem: governance gets split across tools

Teams ship a catalog for discovery, a semantic layer for metric consistency, and a metadata platform for lineage. Each tool governs a slice. Nobody governs the whole.

The result is a governance surface area with seams. A glossary term lives in the catalog but is not linked to the metric definition in the semantic layer. Lineage shows column-level flow but cannot explain whether two entities in different systems represent the same customer. Access policies are enforced at the warehouse level but documented nowhere.

Galaxy targets exactly this gap: the shared context that connects a glossary term to an entity, an entity to a metric, and a metric to the provenance chain that makes it trustworthy. Without that connective tissue, governance is a collection of checklists instead of a functioning system.

What a data catalog is (and isn't)

A data catalog is primarily a discovery UX. It answers: "What data do we have, who owns it, and where does it live?" Good catalogs also surface lineage views, quality scores, and usage statistics. Google Dataplex, for instance, frames its business glossary capability as a way to streamline discovery and reduce ambiguity, leading to better governance.

What typically lives in a catalog

Inventory and classification: Tables, views, dashboards, pipelines, and their technical metadata.
Ownership and stewardship: Who is responsible for a dataset and who approved its use.
Glossary surfacing: Business terms mapped to physical assets so analysts can search by concept, not table name.
Lineage views: Visual graphs showing how data moves from source to consumption.

Galaxy complements catalog workflows by providing richer entity-level semantics that catalogs can surface. Rather than treating a glossary as a flat list of terms, Galaxy models terms as nodes in a knowledge graph with typed relationships to other entities, metrics, and systems.

What catalogs are bad at

Catalogs excel at documentation-in-time but struggle with documentation-over-time. Definitions go stale because stewardship is manual. A glossary entry might say "active customer" means "logged in within 30 days," but if the underlying logic changes in dbt, the catalog entry drifts.

Catalogs also lack execution capability. They can show you a metric definition, but they cannot enforce that the definition is used consistently across BI tools. Galaxy's living model approach addresses drift by linking definitions to the entities and logic they describe, so changes propagate rather than decay.

What a metadata layer is (and isn't)

A metadata layer (sometimes called a unified metadata layer or metadata platform) is the infrastructure that collects, normalizes, and stores metadata from every system in the stack. OpenMetadata, DataHub, Atlan, and Amundsen all operate in this space. The metadata layer feeds downstream tools (catalogs, governance dashboards, orchestrators) with structured information about your data assets.

The two kinds of metadata that matter most

Static metadata describes structure: column names, data types, table relationships, schema versions. It changes infrequently and is easy to capture.

Operational metadata describes behavior: query frequency, freshness timestamps, transformation lineage, error rates. It changes constantly and requires active ingestion pipelines.

Galaxy benefits from both. Static metadata provides the scaffolding for entity resolution (mapping "customer_id" in Salesforce to "user_id" in the product database). Operational metadata provides the signals Galaxy uses to keep its model current, flagging when relationships or definitions may need review.

Lineage and provenance: the governance primitive people underestimate

Microsoft Purview defines data lineage as the lifecycle of data spanning its origin and movement over time across the data estate. Purview lists concrete use cases: troubleshooting and root cause analysis, debugging pipelines, data quality analysis, compliance, and impact analysis.

Lineage tells you where data came from. Provenance tells you why it exists, who produced it, and under what conditions. The W3C's PROV-O ontology provides a set of classes, properties, and restrictions to represent and interchange provenance information across systems.

Galaxy treats provenance as a first-class concern. Every entity, relationship, and definition in Galaxy's graph carries provenance metadata, making it possible for both humans and AI agents to reason about trustworthiness, not just availability.

What a semantic layer is (and isn't)

A semantic layer sits between your data models and your consumption tools (BI, notebooks, APIs). It centralizes metric definitions, manages join paths, and enforces access so every consumer queries a consistent view. dbt's semantic layer, for example, eliminates duplicate coding by defining metrics once and reusing them across tools.

Metrics layer vs semantic layer vs "semantic model"

These terms overlap and confuse. A metrics layer is narrowly focused on metric definitions (revenue = sum of payments where status = 'completed'). A semantic layer is broader, encompassing metrics plus join logic, entity relationships, and access permissions. A semantic model is a specific object within tools like dbt that defines those relationships for a given domain.

Galaxy can ground metrics in governed entities. When a semantic layer defines "monthly recurring revenue," Galaxy provides the entity resolution and definition stability that tells you which "customer" and which "subscription" the metric refers to across systems.

What semantic layers are bad at

Semantic layers solve the "one metric, many tools" problem well. They are weaker at identity resolution (is this the same customer across three systems?), provenance tracking (why was this metric defined this way, and what changed?), and cross-domain context (how does customer churn relate to support ticket volume and contract terms?).

These gaps matter more as organizations adopt AI agents that need to reason across domains, not just execute predefined queries. Galaxy functions as the cross-system context layer that gives semantic layers a stable foundation of entities and relationships.

Where governance actually lives: a responsibility map

The table below maps common governance responsibilities to the layer best positioned to own them. The "Galaxy role" column shows where ontology-driven entity modeling fills gaps.

Governance responsibility	Data catalog	Metadata layer	Semantic layer	Galaxy role
Asset discovery and search	Primary owner	Feeds catalog	References assets	Links assets to entities and meaning
Business glossary and definitions	Surfaces terms	Stores terms	References terms	Links terms to entities, relationships, and logic
Data lineage and provenance	Visualizes lineage	Primary owner (capture)	Contributes metric lineage	Adds entity-level provenance and reasoning chains
Metric definitions and KPI consistency	Documents metrics	Stores metric metadata	Primary owner	Provides entity resolution and definition stability
Access policy and entitlements	Documents policies	Stores policy metadata	Enforces at query time	Documents intent and propagates policy context
Data quality and incident response	Surfaces quality scores	Captures quality signals	N/A (consumption-side)	Speeds impact analysis via entity relationships
Entity resolution and identity	Limited	Stores match keys	Limited	Primary owner: resolves entities across systems

Business glossary and definitions

OpenMetadata defines a glossary as a controlled vocabulary that adds semantics or meaning to data by defining business terminologies. DataHub similarly frames its glossary as a way to organize assets using a shared vocabulary, mapping standardized concepts to physical assets.

Galaxy extends glossary functionality by connecting terms to typed entities in a knowledge graph. "Customer" is not just a term with a text definition; it is a node with relationships to "Subscription," "Contract," "Support Ticket," and "Invoice," each carrying their own semantics and provenance.

Access policy and entitlements

Access enforcement happens at the warehouse, the semantic layer, and sometimes the API gateway. No single layer owns the full picture. Galaxy's contribution is documenting and propagating the intent behind access policies, so a downstream consumer or agent can understand not just whether access is granted but why certain data is restricted.

Metric definitions and KPI consistency

Metric logic belongs in the semantic layer. But metric stability depends on the entities underneath. If "active user" changes meaning because the product team redefines activation, the metric breaks even though its SQL did not change. Galaxy provides a stable entity layer that surfaces definition changes before they silently cascade.

Data quality and incident response

When a data quality incident occurs, root cause analysis depends on lineage. But lineage alone shows column flow, not business impact. Galaxy's entity relationships enable impact analysis at the business level: "This broken pipeline affects the 'order' entity, which feeds the revenue metric, which appears in the board deck."

Reference architecture: how the layers fit together

Galaxy sits between metadata capture and semantic consumption. It ingests metadata from the metadata layer, models entities and relationships as a knowledge graph, and exposes that context to catalogs, semantic layers, BI tools, and AI agents.

A practical "minimum viable governance" stack

For teams starting out, the minimum viable governance stack includes:

A metadata layer that captures schema, lineage, and operational metadata from warehouses, orchestrators, and transformation tools.
A data catalog that surfaces discovery, ownership, and glossary terms to analysts.
A semantic layer that centralizes metric definitions for BI tools.
Galaxy as the entity model that links glossary terms to metric definitions, resolves identities across systems, and reduces manual stewardship overhead.

Without Galaxy or something like it, stewardship remains a manual process where someone periodically reconciles glossary entries with metric logic and pipeline changes. Galaxy automates the linkage so definitions stay connected to the systems they describe.

An "AI and agents" ready stack

AI agents need more than SQL access. They need stable identifiers for business entities, consistent definitions, and provenance chains that explain where answers come from. The W3C's PROV-O model provides the interoperability standard for provenance, and Galaxy implements these concepts as durable context for agents.

An agent-ready stack adds:

Entity resolution so agents can reason about "this customer" across CRM, billing, and product data.
Provenance-aware context so agents can cite their sources and explain their reasoning.
Ontology-driven semantics so agents understand relationships (a subscription belongs to a customer, a customer belongs to an account) without hardcoded logic.

Galaxy serves as the durable context layer that gives agents a structured, trustworthy view of the business rather than a flat table scan.

Evaluation checklist: what to ask vendors and internal teams

Catalog checklist

Does the catalog support automated metadata ingestion, or does it rely on manual entry?
Can stewardship workflows trigger when definitions drift from upstream logic?
Is the glossary linked to physical assets or just a standalone wiki?
Can catalog metadata be operationalized (queried via API, consumed by other tools)?

Metadata layer checklist

How broad is ingestion coverage? Does it capture metadata from warehouses, orchestrators, BI tools, and custom pipelines?
Does the platform capture operational metadata (freshness, query patterns, error rates), or only static schema?
Is metadata queryable and graph-ready, or locked in a proprietary store?
Can the metadata layer feed a knowledge graph like Galaxy for entity-level modeling?

Semantic layer checklist

Are metric definitions version-controlled and auditable?
Does the semantic layer manage join logic and entity relationships, or only aggregation logic?
Are access permissions enforced at query time?
Can metrics be linked to governed entities and glossary terms, or are they isolated definitions?

If your use case involves cross-system entity resolution, provenance tracking, or AI agent grounding, evaluate whether Galaxy-style context modeling is a requirement, not an optional add-on.

Common failure modes (and how to avoid them)

Glossary rot. Teams build a glossary during a governance initiative, then abandon it as definitions drift. Galaxy addresses glossary rot by linking terms to live entities and surfacing when upstream changes invalidate a definition.

Metric fragmentation. Different BI tools compute the same KPI differently because the semantic layer was not adopted universally. Galaxy helps by providing entity-level grounding so metric definitions can be validated against a shared model regardless of which tool queries them.

Lineage without context. Lineage graphs show column-level flow but do not explain business impact. Galaxy's entity relationships add the business layer: which customers, which contracts, which revenue streams are affected.

Catalog as shelfware. A catalog deployed but never integrated into daily workflows becomes expensive documentation. Galaxy reduces this risk by making catalog content actionable through entity resolution and relationship navigation.

Semantic layer without identity. A semantic layer that defines metrics without resolving entity identity across systems produces internally consistent but externally incoherent results. Galaxy provides the identity layer that semantic layers typically lack.

FAQ

Do I need all three layers (catalog, metadata, semantic)? Most mature data organizations benefit from all three, but you do not need to buy them all on day one. Start with the metadata layer (it feeds everything else), add a semantic layer for metric consistency, and layer in a catalog for discovery. Galaxy can enter at any stage, providing the entity model that ties the other layers together.

Where does Galaxy fit in this picture? Galaxy is not a replacement for any of these layers. It is the connective tissue. Galaxy models your business as a knowledge graph of entities, relationships, and meaning, then connects that model to your catalog, your metadata layer, and your semantic layer. Think of it as the ontology that gives every other tool a shared frame of reference.

Where does a knowledge graph fit? A knowledge graph is a data structure, not a product category. Galaxy implements a knowledge graph as its core model. Other tools (Neo4j, Amazon Neptune, Stardog) provide graph databases, but they require you to build and maintain the ontology yourself. Galaxy provides both the graph infrastructure and the business-level semantic modeling.

Is a semantic layer the same as a "semantic model"? Not exactly. A semantic model (as used in dbt or Power BI) is a specific artifact that defines metrics and relationships for a domain. A semantic layer is the broader system that serves those definitions to downstream tools. Galaxy adds entity semantics on top: the identities, relationships, and provenance that semantic models reference but do not typically manage.

Suggested next steps

Audit your current stack. Map which governance responsibilities (from the table above) are currently owned, which are orphaned, and which are duplicated across tools.
Identify entity resolution gaps. Find the places where the same business concept (customer, order, subscription) exists in multiple systems without a shared identifier or definition.
Pilot Galaxy to unify entities, meaning, and provenance across your existing catalog and semantic layer. Start with one high-value domain (revenue, customer lifecycle) and expand from there.
Evaluate for AI readiness. If you plan to deploy AI agents against your data, assess whether your current stack provides the stable identifiers, consistent definitions, and provenance chains that agents require. Galaxy is designed to be that durable context layer.

Governance does not live in any single tool. It lives in the connections between tools, and in the shared understanding that makes those connections trustworthy. Getting the layering right is the first step toward a data stack that humans and machines can both reason over with confidence.

Back to Articles