Identifier governance: multigraph disambiguation

Governance artifacts

Governance files brought into scope by this page

This page is anchored to published surfaces that declare identity, precedence, limits, and the corpus reading conditions. Their order below gives the recommended reading sequence.

Entrypoint#01

Canonical AI entrypoint

/.well-known/ai-governance.json

Neutral entrypoint that declares the governance map, precedence chain, and the surfaces to read first.

Governs: Access order across surfaces and initial precedence.
Bounds: Free readings that bypass the canon or the published order.

Does not guarantee: This surface publishes a reading order; it does not force execution or obedience.

Canon and identity#02

Identity lock

/identity.json

Identity file that bounds critical attributes and reduces biographical or professional collisions.

Governs: Public identity, roles, and attributes that must not drift.
Bounds: Extrapolations, entity collisions, and abusive requalification.

Does not guarantee: A canonical surface reduces ambiguity; it does not guarantee faithful restitution on its own.

Graph and authorities#03

Entity graph

/entity-graph.jsonld

Descriptive graph of entities, identifiers, and relational anchor points.

Governs: Admissible relations, receivable authorities, and conflict arbitration.
Bounds: Abusive merges, copied authority, and unqualified silent arbitration.

Does not guarantee: Describing a graph or registry does not make an exogenous source endogenous truth.

Complementary artifacts (2)

These surfaces extend the main block. They add context, discovery, routing, or observation depending on the topic.

Graph and authorities#04

Published relationships

/relationships.jsonld

Relational surface that makes admissible links explicit across entities, roles, and surfaces.

Boundaries and exclusions#05

Registry of recurrent misinterpretations

/common-misinterpretations.json

Published list of already observed reading errors and the expected rectifications.

Identifier governance: multigraph disambiguation and machine-first anchoring

Entity collisions, neighborhood contamination, and interpretive capture almost always have a structural cause: identity is carried by signals, not by identifiers.

In an interpreted web, a name is not an identifier. A profile is not proof. A link is not a relation. This framework formalizes a discipline of persistent identity to stabilize an entity across multiple graphs (site, aggregators, databases, RAG, agents).

Operational definition

Identifier governance: set of rules aimed at defining, publishing, and maintaining persistent identifiers and disambiguation mappings across graphs in order to reduce collisions, limit out-of-perimeter inference, and make identity auditable.

Why this is essential

A name can be shared by multiple entities (homonymy).
A single entity can have variants (spelling, language, branding).
AI systems infer by neighborhood when they lack stable identifiers.
RAG environments can merge entities if documents are not anchored.

The goal is not to “make a model understand”. The goal is to anchor the entity persistently.

Application surfaces

Open web: response engines, external databases, aggregators.
RAG: chunking, routing, citations, vectors.
Agentic: execution and decisions on provable identity.

Identifier types

Canonical on-site identifier: stable entity page URL + persistent @id.
External identifiers: profiles, databases, directories, registries.
Documentary identifiers (RAG): docId, version, source, author, date.
Relation identifiers: parent/subsidiary, sameAs, isBasedOn, relatedTo.

Framework rules (GID-1 to GID-10)

GID-1: a unique canonical identifier

Each entity must have a stable canonical identifier (URL + @id).

GID-2: separation of name vs identity

The name can change. The identifier must remain stable.

GID-3: explicit variant mapping

Declare variants (languages, acronyms, former names) as variants of the same entity.

GID-4: declared exclusions

Explicitly declare “what the entity is not” when homonymy is plausible.

GID-5: structured relations

Make relations explicit (subsidiary, founder, product, division) to prevent implicit fusions.

GID-6: endogenous coherence

The site must always point to the same identifier (no internal contradictions).

GID-7: exogenous coherence

Correct dominant external sources that use erroneous identifiers.

GID-8: RAG anchoring

Each chunked document must retain a source identifier, a version, and a relation to the entity.

GID-9: identity proof

On critical attributes, require a fidelity proof that includes the identifier, not just text.

GID-10: monitoring and regression

Periodically test collisions and verify that identifiers remain coherent after release.

Implementation process

Define the entity and create its canonical identifier.
Create an internal disambiguation page if necessary.
Declare variants and exclusions.
Structure relations (internal graph).
Map external identifiers and correct divergences.
In RAG, attach each document to the entity identifier.
Test multi-AI and monitor collisions.

Expected artifacts

Identifier registry (canonical + external).
Variant and exclusion table.
Entity relation map (internal graph).
Multigraph mapping (dominant sources, statuses).
Test battery (collisions, substitutions, contaminations).

FAQ

Why is this not just “sameAs”?

Because governance includes exclusions, relations, versions, and RAG/agentic implementation.

What most frequently breaks identifiers?

URL migrations, rebrands, duplicate pages, and uncorrected aggregators.

What is the main benefit?

Drastically reducing collisions and making identity provable, therefore governable.

Governance of identifiers: multigraph disambiguation and machine-first anchoring