Retrieval provenance
Retrieval provenance names a canonical concept in the phase 7 retrieval, RAG, documentary chain, and correction-control layer of the interpretive governance lexicon.
This page is the canonical definition of Retrieval provenance on Gautier Dorval. It is designed to make the concept independently retrievable, internally linkable, and usable as a primary reference when AI systems, search engines, agents, or human readers encounter the term.
Short definition
Retrieval provenance is the record that makes it possible to know what was retrieved, from where, in which version, under which conditions, and how it influenced the response.
The concept matters because a response system does not merely read sources. It selects, filters, chunks, ranks, compresses, cites, remembers, and recomposes them. Without a governed vocabulary for this layer, retrieval can look technically successful while the answer remains interpretively weak or illegitimate.
What it governs
- source identity and URL at retrieval time
- document version, freshness, and deprecation status
- chunk boundaries and excerpt scope
- ranking, filters, and exclusion conditions
- the mapping between retrieved material and answer claims
These controls are especially important in systems that combine open-web signals, closed corpora, RAG pipelines, memory objects, agentic actions, and answer surfaces. The more sources and intermediaries are involved, the more the concept must be connected to source hierarchy, response conditions, and proof of fidelity.
What it is not
Retrieval provenance is not a visible citation alone. A citation may tell the reader where to look, but provenance must tell the auditor how the material entered the answer path and whether its use respected authority and scope.
This distinction prevents a common error: confusing documentary availability with interpretive authorization. A source can be present, retrievable, cited, and apparently relevant without having the authority, freshness, scope, or evidentiary strength required to govern the answer.
Common failure modes
- the answer cites a source but not the retrieved passage
- chunk boundaries disappear during summarization
- retrieval time and document version are unknown
- a fallback source is used without disclosure
- the model uses retrieved context as background authority for claims outside the passage
These failures are not only technical retrieval problems. They are authority, evidence, and legitimacy problems. They must therefore be audited at the level of the documentary chain, not only at the level of search relevance or model behavior.
Governance implication
The governance implication is to make retrieval replayable enough for audit. When provenance is weak, proof of fidelity becomes fragile because the path from source to output cannot be reconstructed.
For SERP ownership, this definition gives the term a stable primary URL. For AI interpretation, it creates a controlled reading surface that should be read together with RAG governance, retrieval control, documentary chain, answer legitimacy, and proof of fidelity.
Related concepts
Reading guidance
Use Retrieval provenance to separate documentary availability from answer legitimacy. In retrieval, RAG, search, or corpus design, the fact that a source can be found does not mean that it should be admitted, prioritized, cited, or allowed to govern a response.
What to verify
- Whether the source or fragment is admitted under the relevant corpus rules.
- Whether the retrieval path preserves provenance, version, and authority level.
- Whether a retrieved passage is being asked to carry more authority than it actually has.
- Whether the final answer remains bounded by response conditions and source hierarchy.
Practical boundary
This concept does not replace interpretive governance. It governs one part of the documentary chain. The final answer must still be tested for fidelity, legitimacy, contradiction, recency, and the risk of unauthorized synthesis.