Coherent hallucinations: the real risk

Errors produced by AI systems are often associated with obvious absurdities: incoherent answers, plainly false facts, or contradictory claims.

Those errors exist, but they are not the main risk observed in the field. The most problematic drifts are of another order: they are coherent, plausible, and often undetectable without close analysis.

To situate this observation within a broader frame, see Positioning.

What a coherent hallucination is

A coherent hallucination is an erroneous output that still respects the internal logic of the system.

It does not contradict the available signals. It extends them in a plausible way.

For a human reader as for a third-party system, it appears reasonable, structured, and credible.

The danger is not the absurd error, but the error that looks true.

Why these hallucinations go unnoticed

Coherent hallucinations insert themselves into environments that are already interpretable.

They rely on existing structures, familiar analogies, and precedents observed elsewhere.

In the absence of an explicit contradictory signal, they trigger no alert mechanism.

When coherence becomes a risk factor

A coherent hallucination is not corrected because it does not look erroneous.

It can be repeated, synthesized, and cited as reliable information.

Gradually, it becomes part of chains of interlinked answers.

The progressive normalization of plausible error

In the field, one recurrent pattern appears:

an initially plausible extrapolation,
its reuse in a synthesis or reformulation,
a cross-citation by another system,
its stabilization as an implicit fact.

At each step, the error gains legitimacy.

In current ecosystems, that chain does not stop there. Once stabilized, a coherent hallucination becomes a premise for new extrapolations.

It is then used as a starting point by other systems, which extend it, generalize it, or specialize it, creating a self-reinforcing loop: plausible error validates itself through cross-repetition.

Why correction arrives too late

By the time a coherent hallucination is identified, it has often already spread.

It has been integrated into persistent graphs, caches, or derived models.

Local correction does not reverse a self-sustaining normalization.

What these observations reveal

Coherent hallucinations are not isolated accidents. They are favored by semantically permissive environments.

When perimeters are blurry, exclusions absent, and hierarchy incoherent, the system is left with wide interpretive freedom.

Under those conditions, coherence becomes a misleading substitute for truthfulness.

Prevention rather than correction

Field observations converge on a clear finding: structural prevention is more effective than a posteriori correction.

Reducing the error space, making exclusions explicit, and governing interpretation all limit the production of coherent hallucinations.

That asymmetry reveals an informational responsibility: the absence of constraint does not remain neutral, it contributes to the formation of derived collective facts.

This goes beyond technique and engages a broader responsibility, developed in particular in Why semantic governance is not optional.

Conclusion

Coherent hallucinations are the major risk in interpretive systems.

Because they look true, they spread and reinforce themselves without resistance.

In an interpreted and interconnected web, designing semantically constrained environments is the only durable way to contain this plausible contagion.

To situate the field of intervention associated with these observations, see About Gautier Dorval.

Further reading: