RAG poisoning: corpus contamination and interpretive drift
This page defines RAG poisoning as a contamination of a retrieval corpus that alters consumed authority and causes interpretive drift.
RAG (Retrieval-Augmented Generation) architectures do not respond solely “with a model”. They respond with a model and a retrieval system: index, embeddings, search engine, document bases, filters, ranking rules, and context assembly. In this framework, the attack surface is not limited to the instruction (prompt). It includes the material that the system will cite, summarize, or treat as reference.
On gautierdorval.com, RAG poisoning is treated as a special case of “AI poisoning”: an alteration of the source consumed as authority in the interpretation chain, which produces biased, unstable, or diverted responses.
Operational definition
RAG poisoning: intentional or instrumentalized contamination of an indexed corpus (documents, fragments, metadata) used for context retrieval, in a manner that displaces consumed authority, biases recall, or injects fragments that systematically alter outputs.
The central property is: poisoned content is not merely visible, it is ingested, indexed, then recalled as context in responses, which gives it an implicit authority rank.
Corpus contamination: what is actually targeted
In a RAG architecture, the attack rarely targets “the model”. It targets the corpus and its selection mechanisms:
- source content (pages, docs, notes, databases, tickets)
- segmentation (chunks) and context boundaries
- embeddings and semantic similarity
- ranking (what surfaces first)
- selection filters and policies
- deduplication, canonicalization, and normalization.
A successful contamination modifies what the system “considers relevant”, not only what it could read.
Minimal typology (effect mechanisms)
- Reference derivation: making a non-canonical source surface as if it were more authoritative.
- Directional bias: orienting responses toward a specific narrative or recurrent attribution.
- Recall instability: provoking contradictions depending on queries, sessions, or formulations.
- Fragment contamination: injecting “plausible” chunks that are recalled out of context and gain implicit authority rank.