Phantom URL audit

Governance artifacts

Governance files brought into scope by this page

This page is anchored to published surfaces that declare identity, precedence, limits, and the corpus reading conditions. Their order below gives the recommended reading sequence.

Discovery and routing#01

Content inventory

/site-content-index.json

Machine-first inventory of the pages, articles, and surfaces published on the site.

Governs: Discoverability, crawl orientation, and the mapping of published surfaces.
Bounds: Incomplete readings that ignore structure, routes, or the preferred markdown surface.

Does not guarantee: A good discovery surface improves access; it is not sufficient on its own to govern reconstruction.

Artifact#02

site-coherence-map.md

/site-coherence-map.md

Published machine-first governance surface.

Governs: Part of the corpus reading conditions.
Bounds: An inference zone that would otherwise remain implicit.

Does not guarantee: This file does not, on its own, guarantee system obedience.

Observability#03

Q-Metrics JSON

/.well-known/q-metrics.json

Descriptive metrics surface for observing gaps, snapshots, and comparisons.

Governs: The description of gaps, drifts, snapshots, and comparisons.
Bounds: Confusion between observed signal, fidelity proof, and actual steering.

Does not guarantee: An observation surface documents an effect; it does not, on its own, guarantee representation.

Evidence layer

Probative surfaces brought into scope by this page

This page does more than point to governance files. It is also anchored to surfaces that make observation, traceability, fidelity, and audit more reconstructible. Their order below makes the minimal evidence chain explicit.

01
Canon and scopeDefinitions canon
02
Weak observationQ-Ledger
03
Derived measurementQ-Metrics

Canonical foundation#01

Definitions canon

/canon.md

Opposable base for identity, scope, roles, and negations that must survive synthesis.

Makes provable: The reference corpus against which fidelity can be evaluated.
Does not prove: Neither that a system already consults it nor that an observed response stays faithful to it.
Use when: Before any observation, test, audit, or correction.

Observation ledger#02

Q-Ledger

/.well-known/q-ledger.json

Public ledger of inferred sessions that makes some observed consultations and sequences visible.

Makes provable: That a behavior was observed as weak, dated, contextualized trace evidence.
Does not prove: Neither actor identity, system obedience, nor strong proof of activation.
Use when: When it is necessary to distinguish descriptive observation from strong attestation.

Descriptive metrics#03

Q-Metrics

/.well-known/q-metrics.json

Derived layer that makes some variations more comparable from one snapshot to another.

Makes provable: That an observed signal can be compared, versioned, and challenged as a descriptive indicator.
Does not prove: Neither the truth of a representation, the fidelity of an output, nor real steering on its own.
Use when: To compare windows, prioritize an audit, and document a before/after.

This framework turns suspicious 404s into interpretive audit data. Its goal is not to create every requested page. Its goal is to distinguish noise, historical errors, hostile scans, and real phantom URLs.

A useful phantom URL is not only non-existent. It is non-existent and plausible. The work therefore consists in verifying non-existence, measuring coherence, clustering patterns, and producing a governed decision.

Expected outcome

At the end of the audit, each phantom URL cluster should be classified into one of the following decisions:

create a canonical surface;
redirect to an existing page;
reinforce internal linking or the coherence map;
publish a clarification;
publish an exclusion or negative definition;
monitor without acting;
intentionally leave as 404;
respond with 410 if invalidity must be explicit.

The decision must be documented. A phantom URL does not automatically become an editorial priority.

Step 1: collect traces

Collect data over a long enough period to distinguish accidents from regularities.

Possible sources:

server logs;
CDN logs;
404 reports;
application logs;
analytics;
Search Console;
404 monitoring tools;
referrals from AI assistants;
URLs cited in generative answers;
controlled prompt tests.

Minimum fields to keep:

requested URL;
response status;
timestamp;
referrer;
user-agent;
IP address or aggregated fingerprint;
country or region if available;
previous entry page if observable;
frequency and recurrence.

Step 2: exclude obvious noise

Before interpretation, remove access patterns that belong to technical or hostile noise.

Typical exclusions:

/wp-admin/, /phpmyadmin/, configuration files, and scans for unused CMS routes;
suspicious extensions or attack payloads;
absurd parameters;
missing assets;
URLs clearly generated by spam;
isolated human typos;
old pages that really were deleted;
known migration routes;
identified bad backlinks;
internal sitemap or redirect errors.

After these causes are removed, the remaining corpus becomes analyzable.

Step 3: verify historical non-existence

Non-existence must not be assumed. It must be verified.

Recommended checks:

CMS;
Git repository;
historical exports;
old sitemaps;
redirect rules;
content databases;
internal archives;
Search Console;
draft history;
public archives if relevant.

If the page has existed before, it is not a phantom URL in the strict sense. It may be historical persistence, citation remanence, or a redirect issue.

Step 4: measure documentary coherence

A non-existent URL becomes interesting when it is coherent.

Qualification questions:

Does the slug follow the site’s conventions?
Does the path match a real category?
Does the vocabulary already appear in the corpus?
Does the page seem to extend an existing editorial family?
Does a neighboring page already exist?
Is the concept mentioned without being stabilized?
Does the URL correspond to a need for definition, method, proof, or clarification?

The more positive the answers, the closer the URL gets to a real latent documentary surface.

Step 5: cluster

Do not analyze URLs one by one for too long. The useful signal often appears at cluster level.

Possible families:

definitions;
doctrines;
frameworks;
services;
guides;
comparisons;
policies;
proof surfaces;
use-case pages;
FR/EN variants;
singular/plural variants;
reformulations of the same concept.

A recurring cluster matters more than an isolated URL.

Step 6: score

Criterion	Question	Score
Historical non-existence	Has the URL never existed?	0 to 3
Slug coherence	Does the path follow site conventions?	0 to 3
Semantic coherence	Is the concept related to the corpus?	0 to 3
Documentary proximity	Does a neighboring page exist?	0 to 3
Recurrence	Does the URL or cluster return?	0 to 3
Source	Does the context suggest an agent, AI tool, or AI referral?	0 to 3
Strategic value	Would the surface strengthen the canon?	0 to 3
Confusion risk	Does the absence leave risky inference space?	0 to 3

Step 7: decide

Create

Create when the URL reveals a real documentary gap and a new surface can strengthen the canon without creating duplication.

Redirect

Redirect when the intent is clear and an existing page already answers correctly.

Clarify

Clarify when the phantom URL reveals confusion between two concepts, two services, two levels of authority, or two categories.

Exclude

Exclude when the URL reveals a false expectation. In that case, a negative definition, clarification, or out-of-scope statement may be more useful than a positive page.

Link

Reinforce internal linking when the content already exists but the documentary dependency is not explicit enough.

Monitor

Monitor when the cluster is interesting but insufficient to justify action.

Leave as 404

Leave as 404 when the signal is weak, hostile, non-strategic, or misleading.

Audit deliverables

A complete audit should produce:

a list of excluded URLs with exclusion reasons;
a list of candidate phantom URLs;
a cluster map;
a score per cluster;
an origin hypothesis;
an editorial decision;
a priority;
a 30, 60, and 90-day monitoring window.

Prudence rule

A phantom URL audit must never claim to read inside a model. It observes traces, qualifies paths, and governs documentary decisions.

The value is not in speculative attribution. It is in reducing the gap between the published corpus, the expected corpus, and the governed corpus.