Identity Resolution and Entity Mastering for the Enterprise Context Layer

Every enterprise has the same dirty secret: the same customer, supplier, or product exists as dozens of slightly different records across CRM, ERP, billing, support, and procurement systems. Identity resolution is the process of determining whether different data records refer to the same real-world entity. Entity mastering is the process of deciding what the enterprise should trust once they do, producing a governed golden record that downstream systems can rely on.
Getting both right is a prerequisite for any master data management strategy and for any enterprise context layer that aims to support reliable analytics, retrieval, policy enforcement, and AI agent behavior.
Neo4j frames entity resolution as determining when different data records represent the same real-world entity. That sounds simple. In practice, it is a decision problem under uncertainty because exact identifiers are often missing, inconsistent, or overloaded across enterprise systems.
This article is part of the enterprise context strategy series. For the full system view, see the enterprise context strategy reference architecture. For the operational walkthrough, see the end-to-end enterprise context data flow. This article focuses on identity resolution and entity mastering inside the context layer, and connects directly to ontology management and semantic modeling and provenance and lineage for AI-ready enterprise context.
Why identity resolution belongs inside the context layer
Unresolved duplicates do more than waste storage. They fragment every system that consumes entity data. When a customer appears as three separate records, analytics double- or triple-count revenue, retrieval surfaces incomplete context, and AI agents generate conflicting answers depending on which record they happen to find first.
Policy enforcement breaks in similar ways. If a supplier entity is fragmented, spend limits apply to each fragment independently, and consolidated exposure is invisible. An enterprise context layer, the shared semantic infrastructure that feeds analytics, retrieval, and agent behavior, needs resolved identities to function as a coherent world model.
Identity resolution is where raw records become trustworthy entities. If this process is pushed outside the context layer into downstream consumers, each team builds its own resolution logic, and the enterprise ends up with competing, incompatible views of the same real-world objects.
Identity resolution vs entity mastering
These two concepts are related but distinct. Identity resolution answers the question: do these records describe the same thing? Entity mastering answers the follow-up: given that they do, what should the canonical enterprise representation look like?
Resolution is a linking problem. It compares records, evaluates evidence, and proposes sameness. Mastering is a governance problem. It selects trusted attribute values, assigns a stable identifier, and maintains that identity over time as sources change.
It is possible to resolve identities without mastering them by maintaining only the links. It is also possible to attempt mastering without proper resolution by simply picking a primary system. Neither shortcut produces a reliable MDM foundation or a trustworthy enterprise context layer. Both are needed, in sequence, with clear boundaries between them.
The core flow: match, merge, survivorship
Reltio's documentation describes match and merge as core MDM functions that move data toward a golden state where it serves as a single source of truth. That framing is useful, but it can obscure the fact that match, merge, and survivorship are three distinct stages with different inputs, controls, and failure modes.
Match
Matching is the comparison stage. It takes incoming or existing records and evaluates whether they are likely to represent the same entity. Deterministic matching uses exact key agreement, such as the same tax ID, email, or DUNS number. Probabilistic matching uses weighted evidence across multiple attributes, such as name similarity, address proximity, phone overlap, and behavioral signals.
IBM's MDM documentation confirms that matching is used to determine whether records can be collected into master data entities. In practice, most enterprises need both deterministic and probabilistic approaches because no single identifier is universally present or reliable across all source systems.
The output of matching is a set of candidate pairs or clusters with associated confidence scores, not a merged entity. Those candidates then move to merge decisions, either automatically or through human review.
Merge
Merge is the consolidation step. When a match is accepted, the contributing source records are linked under a single entity. The merge operation does not discard source records. Instead, it creates a parent entity that references all contributing records while maintaining their provenance.
A merge can be additive, where a new source record joins an existing entity, or consolidating, where two previously independent entities collapse into one. Both cases require the system to track which sources contributed, when the merge happened, and on what evidence basis. Losing those links makes future audits and corrections nearly impossible.
Survivorship
After a merge, an entity may contain multiple conflicting values for the same attribute. Two source systems may supply different phone numbers, legal names, or addresses. Reltio's survivorship documentation describes configurable rules that determine the operational value for each attribute based on strategies like source priority, recency, completeness, or frequency.
Survivorship in MDM is attribute-level governance. A mastered customer might get its legal name from the ERP, its phone number from the CRM, and its address from a third-party enrichment provider. Each attribute can follow a different rule.
Survivorship can also be recalculated as source data changes or business rules evolve. A context layer should support this recalculation without breaking the stable entity identity that downstream systems depend on.
Golden record vs graph entity
There is a useful distinction between two representations of a mastered identity. The golden record is the canonical, flat, governed record: the single set of operational attribute values that the enterprise treats as authoritative. The graph entity is the richer representation that includes the golden attributes plus all source contributions, relationships, match evidence, and provenance edges.
Reltio's data model draws a similar line. An entity in Reltio is a record node with attributes, while a broader profile includes connected entities and interaction data. In a context layer built on a knowledge graph, both representations are usually needed. The golden record serves operational systems and APIs that need a single, clean answer. The graph entity serves governance workflows, audit, and any consumer that needs to understand why the golden record looks the way it does.
Collapsing everything into a flat golden record throws away evidence. Keeping only the graph without a canonical view forces every consumer to re-derive the right answer. A well-designed context layer maintains both and makes the relationship between them explicit.
Stable entity IDs and why they matter
When a mastered entity gets a new ID every time survivorship recalculates or a new source record merges in, downstream systems break. Reports lose historical continuity, API consumers lose their references, and agents that cached an entity ID yesterday cannot find the same entity today.
Stable entity IDs persist across source churn, survivorship recalculation, and re-mastering events. The ID represents the real-world entity, not any particular snapshot of its attributes. When two entities merge, one ID survives and the other becomes an alias that redirects to the surviving ID.
This matters in the end-to-end enterprise context data flow because stable IDs are what let mastered entities move through validation, publication, and serving without breaking downstream references.
Confidence scoring and review workflows
Not every match decision should be automated. Confidence scoring is the control surface that determines what happens after matching. A practical operating model uses three bands:
High confidence matches, above an upper threshold, auto-merge.
Medium confidence matches, between thresholds, create review tasks for data stewards or candidate-link relationships that preserve the proposed match without acting on it.
Low confidence matches, below a lower threshold, remain separate.
Setting those thresholds is a business decision. The cost of a false positive merge, incorrectly collapsing two distinct entities, is almost always higher than the cost of a false negative. A false positive can contaminate customer views, permissions, analytics, and agent retrieval simultaneously. A false negative means two records stay separate and might cause a duplicated outreach or a slightly inaccurate count, which is easier to detect and correct.
Teams often default to aggressive auto-merge thresholds in the name of data cleanliness and then spend months untangling incorrect merges that propagated into reporting, billing, and compliance systems. Start conservative. Lower the auto-merge threshold only after there is confidence in post-merge validation and unmerge workflows.
Post-merge validation
A high match score tells you two records probably refer to the same entity. It does not tell you the resulting mastered entity is valid, complete, and safe to serve. These are different questions, and conflating them is a common source of downstream data quality failures.
Consider a supplier entity that, after merge, ends up with two conflicting tax jurisdictions, a revenue figure that exceeds the parent company's total, or a combination of industry codes that violates the ontology. The match was correct. The merged result is still semantically broken.
Post-merge validation runs business rules and ontology constraints against the mastered entity. It checks for impossible attribute combinations, missing required fields, relationship violations, and value-range breaches. This stage depends on governed definitions and constraints from ontology management and semantic modeling. If validation fails, the entity should be flagged for steward review rather than served as-is.
For the dedicated deep dive on this control point, see constraint validation for enterprise context.
Provenance for merges and survivorship
Every merge decision and every survivorship choice should carry provenance. That means recording which source records contributed to a mastered entity, which match rule fired and at what confidence, which survivorship rule selected each operational value, and whether any human override occurred.
Without merge provenance, governance becomes guesswork. When a downstream analyst asks why a customer's address shows Dallas when the CRM says Chicago, the answer should be traceable: the survivorship rule selected the enrichment provider's address because it scored higher on completeness, and the merge was based on a 94 percent probabilistic match on name, email, and phone.
The broader model is covered in provenance and lineage for AI-ready enterprise context. For identity resolution specifically, provenance needs to capture match rationale, rule outcomes, source contributions per attribute, and the full history of overrides and re-mastering events.
Governance controls for entity mastering
Entity mastering is not a batch job that runs once. It requires ongoing governance controls: approval policies for merges above certain risk thresholds, exception handling for entities that fail validation, unmerge paths for correcting false positives, and audit trails that satisfy regulatory and internal compliance requirements.
Unmerge deserves particular attention. When a false positive merge is discovered, the system needs to cleanly separate the entities, reassign downstream references, and propagate the correction. If the architecture treats merges as irreversible, the eventual fix becomes manual cleanup across every consuming system.
Steward review workflows should surface the match evidence, the proposed survivorship values, and any validation warnings in a single view. Reviewers should not have to reconstruct the merge rationale from raw logs.
How mastered entities improve downstream systems
When identity resolution and entity mastering work correctly, the benefits propagate through every system that consumes entity data. Analytics teams get accurate counts and aggregations because they operate on deduplicated, governed entities rather than fragmented source records. Retrieval systems return complete context because the entity's full attribute set and relationship neighborhood are consolidated in one place.
Policy and compliance systems can evaluate rules against the authoritative golden record rather than guessing which of several partial records to trust. AI agents that ground their responses in entity data produce consistent answers because they reference a single, stable identity rather than whichever record a search system happened to return.
The operational value of mastered entities compounds as more systems consume them. Each additional consumer that relies on the context layer instead of building its own resolution logic reduces drift and duplication across the enterprise.
Common failure modes
The most damaging failure in any MDM pipeline is the false positive merge: two distinct entities collapsed into one. The error propagates to every downstream consumer before anyone notices because each system trusts the golden record. Conservative auto-merge thresholds and rigorous post-merge validation are the primary defenses, but neither eliminates the risk entirely. Teams need fast unmerge paths and downstream notification mechanisms to contain the blast radius when a false positive slips through.
Unstable entity IDs create a subtler but equally persistent problem. When a re-mastering run produces new IDs, reports lose continuity, caches go stale, and API integrations fail silently. The failure is quiet: no error messages, just slowly diverging data across systems that once agreed. ID stability has to be a design constraint from day one, because retrofitting it after downstream systems have already stored references is painful.
Opaque scoring undermines the human governance layer. If stewards cannot see why a match was proposed or why a survivorship rule selected a particular value, they cannot make informed approval or override decisions. The system becomes a black box that people rubber-stamp or ignore.
A related pattern is the team that reports a 98 percent match accuracy rate while serving entities that violate business rules. Match accuracy and entity validity are different metrics. Without post-merge validation, high match scores can mask semantically broken output.
Finally, survivorship rules that never get updated degrade quietly. Source quality shifts, new systems come online, and the relative trustworthiness of different sources changes over time. Survivorship governance needs periodic review, ideally triggered by data quality metrics rather than calendar reminders.
A practical operating model
A layered operating model for identity resolution and entity mastering in a context layer separates concerns so that each stage can evolve independently. The five tiers below represent distinct responsibilities, controls, and failure surfaces.
Source records are ingested and standardized but never modified. They remain the system of record for what each source actually said.
Candidate matches are proposed by matching logic, scored, and stored as explicit relationships between source records. No merge has occurred yet.
Mastered entities are created when a candidate match is accepted, automatically or by a steward. Each mastered entity carries a stable ID, survivorship-selected operational values, and links to all contributing source records.
Validation runs post-merge checks against ontology constraints and business rules, flagging any entity that fails for review before it reaches consumers.
Serving exposes mastered entities, with their stable IDs and governed attributes, to downstream consumers through APIs, graph queries, and event streams.
This separation means matching logic can improve without disrupting mastered entities. Survivorship rules can change without breaking stable IDs. Validation can tighten without requiring re-matching. Each layer has its own controls, failure modes, and audit surface.
Frequently asked questions
What is the difference between identity resolution and entity mastering?
Identity resolution determines whether two or more records refer to the same real-world entity. Entity mastering takes resolved records and produces a single governed golden record with trusted attribute values and a stable identifier.
What is a golden record?
A golden record is the canonical, authoritative representation of a mastered entity. It contains the survivorship-selected values for each attribute and serves as the single version of truth for downstream systems.
Why are stable entity IDs important?
Stable entity IDs persist across merges, survivorship recalculations, and source changes. Without them, downstream reports lose historical continuity, API consumers lose their references, and cached entity lookups silently break.
What is survivorship in MDM?
Survivorship is the set of rules that determine which attribute value wins when multiple source records contribute conflicting data for the same field. Common strategies include source priority, recency, completeness, and frequency.
How does identity resolution relate to data deduplication?
Data deduplication often refers to removing duplicate records within a single system. Identity resolution extends that concept across multiple systems, using deterministic and probabilistic matching to identify records that refer to the same entity even when identifiers differ or are missing.
What happens when a merge is wrong?
A false positive merge requires an unmerge: cleanly separating the entities, reassigning downstream references, and propagating the correction. Architectures that treat merges as irreversible eventually face costly manual cleanup across every consuming system.
Related Deep Dives
Interested in learning more about Galaxy?





