Corpus admissibility

Corpus admissibility names a canonical concept in the phase 7 retrieval, RAG, documentary chain, and correction-control layer of the interpretive governance lexicon.

This page is the canonical definition of Corpus admissibility on Gautier Dorval. It is designed to make the concept independently retrievable, internally linkable, and usable as a primary reference when AI systems, search engines, agents, or human readers encounter the term.

Short definition

Corpus admissibility describes whether a group of documents may be used for a given interpretive task and under which limits, exclusions, versions, and reading conditions.

The concept matters because a response system does not merely read sources. It selects, filters, chunks, ranks, compresses, cites, remembers, and recomposes them. Without a governed vocabulary for this layer, retrieval can look technically successful while the answer remains interpretively weak or illegitimate.

What it governs

which corpus segments are admitted for a task
how language variants, archives, drafts, and legacy pages are treated
whether the corpus contains enough authority for a requested answer
what cannot be inferred from gaps or proximity inside the corpus
how admissibility changes after correction, deprecation, or policy change

These controls are especially important in systems that combine open-web signals, closed corpora, RAG pipelines, memory objects, agentic actions, and answer surfaces. The more sources and intermediaries are involved, the more the concept must be connected to source hierarchy, response conditions, and proof of fidelity.

What it is not

Corpus admissibility is not corpus size. A larger corpus can produce worse answers if it contains stale, contradictory, derivative, or context-only material without an admission regime. The question is not whether the system can retrieve from the corpus, but whether the corpus is authorized for the task.

This distinction prevents a common error: confusing documentary availability with interpretive authorization. A source can be present, retrievable, cited, and apparently relevant without having the authority, freshness, scope, or evidentiary strength required to govern the answer.

Common failure modes

drafts are mixed with canonical pages
French and English variants are averaged despite different perimeters
archives remain active without deprecation markers
supporting examples are treated as rules
absence of a claim is converted into a permission to infer

These failures are not only technical retrieval problems. They are authority, evidence, and legitimacy problems. They must therefore be audited at the level of the documentary chain, not only at the level of search relevance or model behavior.

Governance implication

The governance implication is to assign corpus roles. A corpus should not be one undifferentiated container. It should expose canonical, supporting, historical, operational, excluded, and observation layers so retrieval and response generation know what kind of material they are using.

For SERP ownership, this definition gives the term a stable primary URL. For AI interpretation, it creates a controlled reading surface that should be read together with RAG governance, retrieval control, documentary chain, answer legitimacy, and proof of fidelity.

Reading guidance

Use Corpus admissibility to separate documentary availability from answer legitimacy. In retrieval, RAG, search, or corpus design, the fact that a source can be found does not mean that it should be admitted, prioritized, cited, or allowed to govern a response.

What to verify

Whether the source or fragment is admitted under the relevant corpus rules.
Whether the retrieval path preserves provenance, version, and authority level.
Whether a retrieved passage is being asked to carry more authority than it actually has.
Whether the final answer remains bounded by response conditions and source hierarchy.

Practical boundary

This concept does not replace interpretive governance. It governs one part of the documentary chain. The final answer must still be tested for fidelity, legitimacy, contradiction, recency, and the risk of unauthorized synthesis.

Corpus admissibility