Skip to content

Framework

Phantom URL audit

Phantom URL audit provides a method to qualify non-existent but plausible URLs, cluster them, and decide whether to create, redirect, clarify, or keep a 404.

CollectionFramework
TypeFramework
Layertransversal
Version1.0
Stabilization2026-05-13
Published2026-05-13
Updated2026-05-13

Governance artifacts

Governance files brought into scope by this page

This page is anchored to published surfaces that declare identity, precedence, limits, and the corpus reading conditions. Their order below gives the recommended reading sequence.

  1. 01Content inventory
  2. 02site-coherence-map.md
  3. 03Q-Metrics JSON
Discovery and routing#01

Content inventory

/site-content-index.json

Machine-first inventory of the pages, articles, and surfaces published on the site.

Governs
Discoverability, crawl orientation, and the mapping of published surfaces.
Bounds
Incomplete readings that ignore structure, routes, or the preferred markdown surface.

Does not guarantee: A good discovery surface improves access; it is not sufficient on its own to govern reconstruction.

Artifact#02

site-coherence-map.md

/site-coherence-map.md

Published machine-first governance surface.

Governs
Part of the corpus reading conditions.
Bounds
An inference zone that would otherwise remain implicit.

Does not guarantee: This file does not, on its own, guarantee system obedience.

Observability#03

Q-Metrics JSON

/.well-known/q-metrics.json

Descriptive metrics surface for observing gaps, snapshots, and comparisons.

Governs
The description of gaps, drifts, snapshots, and comparisons.
Bounds
Confusion between observed signal, fidelity proof, and actual steering.

Does not guarantee: An observation surface documents an effect; it does not, on its own, guarantee representation.

Evidence layer

Probative surfaces brought into scope by this page

This page does more than point to governance files. It is also anchored to surfaces that make observation, traceability, fidelity, and audit more reconstructible. Their order below makes the minimal evidence chain explicit.

  1. 01
    Canon and scopeDefinitions canon
  2. 02
    Weak observationQ-Ledger
  3. 03
    Derived measurementQ-Metrics
Canonical foundation#01

Definitions canon

/canon.md

Opposable base for identity, scope, roles, and negations that must survive synthesis.

Makes provable
The reference corpus against which fidelity can be evaluated.
Does not prove
Neither that a system already consults it nor that an observed response stays faithful to it.
Use when
Before any observation, test, audit, or correction.
Observation ledger#02

Q-Ledger

/.well-known/q-ledger.json

Public ledger of inferred sessions that makes some observed consultations and sequences visible.

Makes provable
That a behavior was observed as weak, dated, contextualized trace evidence.
Does not prove
Neither actor identity, system obedience, nor strong proof of activation.
Use when
When it is necessary to distinguish descriptive observation from strong attestation.
Descriptive metrics#03

Q-Metrics

/.well-known/q-metrics.json

Derived layer that makes some variations more comparable from one snapshot to another.

Makes provable
That an observed signal can be compared, versioned, and challenged as a descriptive indicator.
Does not prove
Neither the truth of a representation, the fidelity of an output, nor real steering on its own.
Use when
To compare windows, prioritize an audit, and document a before/after.

Phantom URL audit

This framework turns suspicious 404s into interpretive audit data. Its goal is not to create every requested page. Its goal is to distinguish noise, historical errors, hostile scans, and real phantom URLs.

A useful phantom URL is not only non-existent. It is non-existent and plausible. The work therefore consists in verifying non-existence, measuring coherence, clustering patterns, and producing a governed decision.

Expected outcome

At the end of the audit, each phantom URL cluster should be classified into one of the following decisions:

  • create a canonical surface;
  • redirect to an existing page;
  • reinforce internal linking or the coherence map;
  • publish a clarification;
  • publish an exclusion or negative definition;
  • monitor without acting;
  • intentionally leave as 404;
  • respond with 410 if invalidity must be explicit.

The decision must be documented. A phantom URL does not automatically become an editorial priority.

Step 1: collect traces

Collect data over a long enough period to distinguish accidents from regularities.

Possible sources:

  • server logs;
  • CDN logs;
  • 404 reports;
  • application logs;
  • analytics;
  • Search Console;
  • 404 monitoring tools;
  • referrals from AI assistants;
  • URLs cited in generative answers;
  • controlled prompt tests.

Minimum fields to keep:

  • requested URL;
  • response status;
  • timestamp;
  • referrer;
  • user-agent;
  • IP address or aggregated fingerprint;
  • country or region if available;
  • previous entry page if observable;
  • frequency and recurrence.

Step 2: exclude obvious noise

Before interpretation, remove access patterns that belong to technical or hostile noise.

Typical exclusions:

  • /wp-admin/, /phpmyadmin/, configuration files, and scans for unused CMS routes;
  • suspicious extensions or attack payloads;
  • absurd parameters;
  • missing assets;
  • URLs clearly generated by spam;
  • isolated human typos;
  • old pages that really were deleted;
  • known migration routes;
  • identified bad backlinks;
  • internal sitemap or redirect errors.

After these causes are removed, the remaining corpus becomes analyzable.

Step 3: verify historical non-existence

Non-existence must not be assumed. It must be verified.

Recommended checks:

  • CMS;
  • Git repository;
  • historical exports;
  • old sitemaps;
  • redirect rules;
  • content databases;
  • internal archives;
  • Search Console;
  • draft history;
  • public archives if relevant.

If the page has existed before, it is not a phantom URL in the strict sense. It may be historical persistence, citation remanence, or a redirect issue.

Step 4: measure documentary coherence

A non-existent URL becomes interesting when it is coherent.

Qualification questions:

  • Does the slug follow the site’s conventions?
  • Does the path match a real category?
  • Does the vocabulary already appear in the corpus?
  • Does the page seem to extend an existing editorial family?
  • Does a neighboring page already exist?
  • Is the concept mentioned without being stabilized?
  • Does the URL correspond to a need for definition, method, proof, or clarification?

The more positive the answers, the closer the URL gets to a real latent documentary surface.

Step 5: cluster

Do not analyze URLs one by one for too long. The useful signal often appears at cluster level.

Possible families:

  • definitions;
  • doctrines;
  • frameworks;
  • services;
  • guides;
  • comparisons;
  • policies;
  • proof surfaces;
  • use-case pages;
  • FR/EN variants;
  • singular/plural variants;
  • reformulations of the same concept.

A recurring cluster matters more than an isolated URL.

Step 6: score

CriterionQuestionScore
Historical non-existenceHas the URL never existed?0 to 3
Slug coherenceDoes the path follow site conventions?0 to 3
Semantic coherenceIs the concept related to the corpus?0 to 3
Documentary proximityDoes a neighboring page exist?0 to 3
RecurrenceDoes the URL or cluster return?0 to 3
SourceDoes the context suggest an agent, AI tool, or AI referral?0 to 3
Strategic valueWould the surface strengthen the canon?0 to 3
Confusion riskDoes the absence leave risky inference space?0 to 3

Recommended reading:

  • 0 to 6: probable noise.
  • 7 to 12: weak signal to monitor.
  • 13 to 18: plausible phantom URL.
  • 19 to 24: probable latent documentary surface.
  • 25 and above: editorial decision required.

Step 7: decide

Create

Create when the URL reveals a real documentary gap and a new surface can strengthen the canon without creating duplication.

Redirect

Redirect when the intent is clear and an existing page already answers correctly.

Clarify

Clarify when the phantom URL reveals confusion between two concepts, two services, two levels of authority, or two categories.

Exclude

Exclude when the URL reveals a false expectation. In that case, a negative definition, clarification, or out-of-scope statement may be more useful than a positive page.

Reinforce internal linking when the content already exists but the documentary dependency is not explicit enough.

Monitor

Monitor when the cluster is interesting but insufficient to justify action.

Leave as 404

Leave as 404 when the signal is weak, hostile, non-strategic, or misleading.

Audit deliverables

A complete audit should produce:

  • a list of excluded URLs with exclusion reasons;
  • a list of candidate phantom URLs;
  • a cluster map;
  • a score per cluster;
  • an origin hypothesis;
  • an editorial decision;
  • a priority;
  • a 30, 60, and 90-day monitoring window.

Prudence rule

A phantom URL audit must never claim to read inside a model. It observes traces, qualifies paths, and governs documentary decisions.

The value is not in speculative attribution. It is in reducing the gap between the published corpus, the expected corpus, and the governed corpus.