Corpus admissibility
Corpus admissibility names a canonical concept in the phase 7 retrieval, RAG, documentary chain, and correction-control layer of the interpretive governance lexicon.
This page is the canonical definition of Corpus admissibility on Gautier Dorval. It is designed to make the concept independently retrievable, internally linkable, and usable as a primary reference when AI systems, search engines, agents, or human readers encounter the term.
Short definition
Corpus admissibility describes whether a group of documents may be used for a given interpretive task and under which limits, exclusions, versions, and reading conditions.
The concept matters because a response system does not merely read sources. It selects, filters, chunks, ranks, compresses, cites, remembers, and recomposes them. Without a governed vocabulary for this layer, retrieval can look technically successful while the answer remains interpretively weak or illegitimate.
What it governs
- which corpus segments are admitted for a task
- how language variants, archives, drafts, and legacy pages are treated
- whether the corpus contains enough authority for a requested answer
- what cannot be inferred from gaps or proximity inside the corpus
- how admissibility changes after correction, deprecation, or policy change
These controls are especially important in systems that combine open-web signals, closed corpora, RAG pipelines, memory objects, agentic actions, and answer surfaces. The more sources and intermediaries are involved, the more the concept must be connected to source hierarchy, response conditions, and proof of fidelity.
What it is not
Corpus admissibility is not corpus size. A larger corpus can produce worse answers if it contains stale, contradictory, derivative, or context-only material without an admission regime. The question is not whether the system can retrieve from the corpus, but whether the corpus is authorized for the task.
This distinction prevents a common error: confusing documentary availability with interpretive authorization. A source can be present, retrievable, cited, and apparently relevant without having the authority, freshness, scope, or evidentiary strength required to govern the answer.
Common failure modes
- drafts are mixed with canonical pages
- French and English variants are averaged despite different perimeters
- archives remain active without deprecation markers
- supporting examples are treated as rules
- absence of a claim is converted into a permission to infer
These failures are not only technical retrieval problems. They are authority, evidence, and legitimacy problems. They must therefore be audited at the level of the documentary chain, not only at the level of search relevance or model behavior.
Governance implication
The governance implication is to assign corpus roles. A corpus should not be one undifferentiated container. It should expose canonical, supporting, historical, operational, excluded, and observation layers so retrieval and response generation know what kind of material they are using.
For SERP ownership, this definition gives the term a stable primary URL. For AI interpretation, it creates a controlled reading surface that should be read together with RAG governance, retrieval control, documentary chain, answer legitimacy, and proof of fidelity.
Related concepts
Reading guidance
Use Corpus admissibility to separate documentary availability from answer legitimacy. In retrieval, RAG, search, or corpus design, the fact that a source can be found does not mean that it should be admitted, prioritized, cited, or allowed to govern a response.
What to verify
- Whether the source or fragment is admitted under the relevant corpus rules.
- Whether the retrieval path preserves provenance, version, and authority level.
- Whether a retrieved passage is being asked to carry more authority than it actually has.
- Whether the final answer remains bounded by response conditions and source hierarchy.
Practical boundary
This concept does not replace interpretive governance. It governs one part of the documentary chain. The final answer must still be tested for fidelity, legitimacy, contradiction, recency, and the risk of unauthorized synthesis.