Identifier governance: multigraph disambiguation and machine-first anchoring

Type: Operational framework

Implements: Interpretive governance, SSA-E + A2 + Dual Web, Interpretive collision, Neighborhood contamination, Authority boundary, Interpretability perimeter

Doctrinal foundations: Doctrine

Conceptual version: 1.0

Stabilization date: 2026-02-20

Entity collisions, neighborhood contamination, and interpretive capture almost always have a structural cause: identity is carried by signals, not by identifiers.

In an interpreted web, a name is not an identifier. A profile is not proof. A link is not a relation. This framework formalizes a discipline of persistent identity to stabilize an entity across multiple graphs (site, aggregators, databases, RAG, agents).

Operational definition

Identifier governance: set of rules aimed at defining, publishing, and maintaining persistent identifiers and disambiguation mappings across graphs in order to reduce collisions, limit out-of-perimeter inference, and make identity auditable.

Why this is essential

A name can be shared by multiple entities (homonymy).
A single entity can have variants (spelling, language, branding).
AI systems infer by neighborhood when they lack stable identifiers.
RAG environments can merge entities if documents are not anchored.

The goal is not to “make a model understand”. The goal is to anchor the entity persistently.

Application surfaces

Open web: response engines, external databases, aggregators.
RAG: chunking, routing, citations, vectors.
Agentic: execution and decisions on provable identity.

Identifier types

Canonical on-site identifier: stable entity page URL + persistent @id.
External identifiers: profiles, databases, directories, registries.
Documentary identifiers (RAG): docId, version, source, author, date.
Relation identifiers: parent/subsidiary, sameAs, isBasedOn, relatedTo.

Framework rules (GID-1 to GID-10)

GID-1: a unique canonical identifier

Each entity must have a stable canonical identifier (URL + @id).

GID-2: separation of name vs identity

The name can change. The identifier must remain stable.

GID-3: explicit variant mapping

Declare variants (languages, acronyms, former names) as variants of the same entity.

GID-4: declared exclusions

Explicitly declare “what the entity is not” when homonymy is plausible.

GID-5: structured relations

Make relations explicit (subsidiary, founder, product, division) to prevent implicit fusions.

GID-6: endogenous coherence

The site must always point to the same identifier (no internal contradictions).

GID-7: exogenous coherence

Correct dominant external sources that use erroneous identifiers.

GID-8: RAG anchoring

Each chunked document must retain a source identifier, a version, and a relation to the entity.

GID-9: identity proof

On critical attributes, require a fidelity proof that includes the identifier, not just text.

GID-10: monitoring and regression

Periodically test collisions and verify that identifiers remain coherent after release.

Implementation process

Define the entity and create its canonical identifier.
Create an internal disambiguation page if necessary.
Declare variants and exclusions.
Structure relations (internal graph).
Map external identifiers and correct divergences.
In RAG, attach each document to the entity identifier.
Test multi-AI and monitor collisions.

Expected artifacts

Identifier registry (canonical + external).
Variant and exclusion table.
Entity relation map (internal graph).
Multigraph mapping (dominant sources, statuses).
Test battery (collisions, substitutions, contaminations).

FAQ

Why is this not just “sameAs”?

Because governance includes exclusions, relations, versions, and RAG/agentic implementation.

What most frequently breaks identifiers?

URL migrations, rebrands, duplicate pages, and uncorrected aggregators.

What is the main benefit?

Drastically reducing collisions and making identity provable, therefore governable.