Discoverability vs training

Governance artifacts

Governance files brought into scope by this page

This page is anchored to published surfaces that declare identity, precedence, limits, and the corpus reading conditions. Their order below gives the recommended reading sequence.

Entrypoint#01

Canonical AI entrypoint

/.well-known/ai-governance.json

Neutral entrypoint that declares the governance map, precedence chain, and the surfaces to read first.

Governs: Access order across surfaces and initial precedence.
Bounds: Free readings that bypass the canon or the published order.

Does not guarantee: This surface publishes a reading order; it does not force execution or obedience.

Entrypoint#02

Public AI manifest

/ai-manifest.json

Structured inventory of the surfaces, registries, and modules that extend the canonical entrypoint.

Governs: Access order across surfaces and initial precedence.
Bounds: Free readings that bypass the canon or the published order.

Does not guarantee: This surface publishes a reading order; it does not force execution or obedience.

Context and versioning#03

Site context

/site-context.md

Notice that qualifies the nature of the site, its reference function, and its non-transactional limits.

Governs: Editorial framing, temporality, and the readability of explicit changes.
Bounds: Silent drifts and readings that assume stability without checking versions.

Does not guarantee: Versioning makes a gap auditable; it does not automatically correct outputs already in circulation.

Complementary artifacts (3)

These surfaces extend the main block. They add context, discovery, routing, or observation depending on the topic.

Boundaries and exclusions#04

Registry of recurrent misinterpretations

/common-misinterpretations.json

Published list of already observed reading errors and the expected rectifications.

Canon and identity#05

Definitions canon

/canon.md

Canonical surface that fixes identity, roles, negations, and divergence rules.

Policy and legitimacy#06

Q-Layer in Markdown

/response-legitimacy.md

Canonical surface for response legitimacy, clarification, and legitimate non-response.

Evidence layer

Probative surfaces brought into scope by this page

This page does more than point to governance files. It is also anchored to surfaces that make observation, traceability, fidelity, and audit more reconstructible. Their order below makes the minimal evidence chain explicit.

01
Observation mapObservatory map
02
Evidence artifactsite-context.md
03
Evidence artifactcommon-misinterpretations.json

Observation index#01

Observatory map

/observations/observatory-map.json

Machine-first index of published observation resources, snapshots, and comparison points.

Makes provable: Where the observation objects used in an evidence chain are located.
Does not prove: Neither the quality of a result nor the fidelity of a particular response.
Use when: To locate baselines, ledgers, snapshots, and derived artifacts.

Artifact#02

site-context.md

/site-context.md

Published surface that contributes to making an evidence chain more reconstructible.

Makes provable: Part of the observation, trace, audit, or fidelity chain.
Does not prove: Neither total proof, obedience guarantee, nor implicit certification.
Use when: When a page needs to make its evidence regime explicit.

Artifact#03

common-misinterpretations.json

/common-misinterpretations.json

Published surface that contributes to making an evidence chain more reconstructible.

Makes provable: Part of the observation, trace, audit, or fidelity chain.
Does not prove: Neither total proof, obedience guarantee, nor implicit certification.
Use when: When a page needs to make its evidence regime explicit.

Why this distinction has become necessary

In interpreted environments, the same site can be consulted for several purposes that must not be confused.

A system can:

discover that a resource exists;
read that resource in order to answer;
reuse its content for training, alignment, or consolidation purposes.

When these three regimes are merged, an organization thinks it is governing one thing while it is actually trying to bound different uses.

Discoverability

Discoverability means that a resource can be found, explored, or retained as a reading candidate.

It mostly concerns:

the public existence of a surface;
its technical accessibility;
its structural clarity;
its probability of being mobilized in a reading path.

Discoverability does not yet tell us:

whether the resource will be cited;
whether its content will be faithfully rendered;
whether its text will be reused for training purposes.

Reading for answer generation

A second regime appears when the system does more than discover a resource and actually uses it to construct an answer.

At that point, the relevant question is no longer only “can the system see me?” but rather:

which surface is actually read;
which part is retained;
which source hierarchy is applied;
under which conditions the response remains legitimate.

That regime should be read with Indexing, answer generation, and training, Signal, proof, and compliance, and the Evidence layer.

Training

Training belongs to a third regime.

It is not equivalent to discovering a resource or citing it once. It concerns the use of a corpus to modify parameters, behaviors, synthesis preferences, or probability distributions.

That is why training must not be described as if it were a simple reading event.

A discoverability signal is not a training signal. An access signal is not proof of reuse. A governance artifact is not a guarantee of obedience.

Why confusion persists

Confusion persists for three reasons.

1. Public vocabulary remains blurred

The market often mixes:

AI visibility;
crawler access;
answer citation;
training use;
declared compliance.

2. Policy surfaces do not govern the same level

robots.txt, llms.txt, meta directives, headers, governance manifests, and contextual pages do not govern the same layer of the problem. See Machine policy surfaces.

3. Systems themselves do not always expose the regime they are using

A system may discover without citing, cite without fidelity, or rely on a resource without making that use legible. Hence the need for a doctrine that distinguishes regimes instead of collapsing them.

Doctrinal consequence

An organization that wants to govern its machine presence correctly must publish enforceable distinctions between:

being discovered;
being read for answering;
being reused for training.

That distinction is not cosmetic. It conditions the validity of published policies, the interpretation of observations, and the ability to avoid abusive conclusions.

Consequence for applied surfaces

An applied surface such as Better Robots.txt can materialize part of that distinction on WordPress. It must not be read as if it exhausted the full doctrine of these regimes.

The proper reading order is therefore:

doctrine of regimes;
clarification of limits;
applied surface;
bounded proof repository.

For the concrete implementation layer, see the Better Robots.txt applied surface.