Indexing, answer generation, and training

Governance artifacts

Governance files brought into scope by this page

This page is anchored to published surfaces that declare identity, precedence, limits, and the corpus reading conditions. Their order below gives the recommended reading sequence.

Entrypoint#01

Canonical AI entrypoint

/.well-known/ai-governance.json

Neutral entrypoint that declares the governance map, precedence chain, and the surfaces to read first.

Governs: Access order across surfaces and initial precedence.
Bounds: Free readings that bypass the canon or the published order.

Does not guarantee: This surface publishes a reading order; it does not force execution or obedience.

Entrypoint#02

Public AI manifest

/ai-manifest.json

Structured inventory of the surfaces, registries, and modules that extend the canonical entrypoint.

Governs: Access order across surfaces and initial precedence.
Bounds: Free readings that bypass the canon or the published order.

Does not guarantee: This surface publishes a reading order; it does not force execution or obedience.

Context and versioning#03

Site context

/site-context.md

Notice that qualifies the nature of the site, its reference function, and its non-transactional limits.

Governs: Editorial framing, temporality, and the readability of explicit changes.
Bounds: Silent drifts and readings that assume stability without checking versions.

Does not guarantee: Versioning makes a gap auditable; it does not automatically correct outputs already in circulation.

Complementary artifacts (2)

These surfaces extend the main block. They add context, discovery, routing, or observation depending on the topic.

Canon and identity#04

Definitions canon

/canon.md

Canonical surface that fixes identity, roles, negations, and divergence rules.

Policy and legitimacy#05

Q-Layer in Markdown

/response-legitimacy.md

Canonical surface for response legitimacy, clarification, and legitimate non-response.

Evidence layer

Probative surfaces brought into scope by this page

This page does more than point to governance files. It is also anchored to surfaces that make observation, traceability, fidelity, and audit more reconstructible. Their order below makes the minimal evidence chain explicit.

01
Observation mapObservatory map
02
Weak observationQ-Ledger

Observation index#01

Observatory map

/observations/observatory-map.json

Machine-first index of published observation resources, snapshots, and comparison points.

Makes provable: Where the observation objects used in an evidence chain are located.
Does not prove: Neither the quality of a result nor the fidelity of a particular response.
Use when: To locate baselines, ledgers, snapshots, and derived artifacts.

Observation ledger#02

Q-Ledger

/.well-known/q-ledger.json

Public ledger of inferred sessions that makes some observed consultations and sequences visible.

Makes provable: That a behavior was observed as weak, dated, contextualized trace evidence.
Does not prove: Neither actor identity, system obedience, nor strong proof of activation.
Use when: When it is necessary to distinguish descriptive observation from strong attestation.

Three regimes, three recurring mistakes

In discussions about AI and the web, three phenomena are often collapsed into one:

indexing;
answer generation;
training.

That collapse produces false diagnoses. One assumes a site is understood because it is indexed. One assumes it is trained upon because it is cited. One assumes a refusal of indexing forbids every other use.

None of these equivalences is reliable.

Indexing

Indexing concerns the existence of a resource in a system of discovery, recall, or ranking.

It may remain minimal. A resource can be:

known without being often mobilized;
indexed without being well understood;
retained as a URL while its role is still poorly interpreted.

Indexing is therefore not yet a semantic victory. It is a possible condition of future access.

Answer generation

Answer generation is a different regime.

Here the source is no longer merely discovered. It is used, summarized, prioritized, reformulated, or placed in competition with other sources to produce an output.

In that regime, the questions become:

who speaks;
which source is prioritized;
which formulation is retained;
how much of the canon survives synthesis.

A resource may have modest classical indexing and still be structurally mobilizable as an answer surface. That is what structural visibility helps describe.

Training

Training belongs to another layer.

The point is no longer simply to recall a resource when answering, but to use corpora so as to stabilize behaviors, preferences, or parameters.

That is why the sentence “an AI cited me, therefore it trained on my content” is valueless on its own.

Likewise, the sentence “my site is blocked from indexing, therefore it cannot serve any other purpose” remains doctrinally unsafe.

Why the separation matters

The separation changes how published artifacts should be read.

robots.txt mostly concerns crawl access and procedural discovery.
llms.txt mostly belongs to a documentary framing layer.
contextual pages, manifests, and governance files mostly bound reading conditions, interpretation, and precedence.

None of these surfaces, taken alone, automatically converts one regime into another.

Consequence for observation

When an organization observes that it appears in an AI answer, it should avoid three abuses:

confusing that presence with proof of dominant indexing;
confusing that presence with proof of training;
believing that the answer alone documents fidelity of reading.

The correct reading goes through the Evidence layer, Q-Ledger, and Q-Metrics.

Consequence for derived instruments

A WordPress implementation such as Better Robots.txt can help organize some operational layers. It must not be treated as if it governed indexing, answer generation, and training by itself.

For that reason, applied instruments should be read after the doctrine of regimes, not before it.