AI poisoning: definition and taxonomy
This page defines “AI poisoning” operationally and proposes a readable taxonomy, to avoid confusions, semantic shifts, and improper analogies.
In AI systems, “poisoning” is often used as a catch-all term, sometimes to designate training corpus poisoning, sometimes injection of elements into a RAG base, sometimes corruption of agentic memory. This ambiguity favors implicit interpretations and erroneous diagnoses.
On gautierdorval.com, the term “AI poisoning” is treated as a concept of intentional or instrumentalized corruption of an authority source in a system’s interpretation chain. It is not a rhetorical effect, nor simple “disinformation”, but an action that aims to degrade, bias, divert, or destabilize response production.
Status of this page
This page is an interpretive clarification.
It aims to stabilize internal usage of the term, establish reading bounds, and provide a functional taxonomy. It does not normalize external vocabulary and does not claim to cover the entirety of security research.
Operational definition
AI poisoning: deliberate (or made deliberable) alteration of a data flow, knowledge base, or memory mechanism, so as to produce a systematic drift in AI system outputs, whether through bias, degradation, deviation, or instability.
A poisoning is recognized by a central property: it targets a source consumed as authority by the system (training, index, retrieval, memory, rules, tools, prompts, pipeline), and not only content exposed to humans.
Functional taxonomy
This taxonomy classifies poisoning according to where the alteration occurs and what type of effect is sought.
1) By alteration surface (where it happens)
- Training poisoning: alteration of a dataset used to tune a model or learning component.
- Retrieval poisoning (RAG): alteration of an indexed base, internal search engine, graph, or corpus serving as passage retrieval.
- Agentic memory poisoning: alteration of a state storage (episodic, semantic, procedural memory) to influence an agent’s future decisions.
- Pipeline poisoning: alteration of an upstream stage (ETL, scraping, normalization, deduplication, scoring, filters) that modifies consumed truth.
- Instruction poisoning: alteration of an instruction system, policies, templates, or tools (prompts, rules, functions) that orient interpretation.
2) By effect type
- Directional bias: favor an interpretation, attribution, or narrative.
- Degradation: introduce noise, contradictions, or conceptual confusion.
- Reference derivation: make the system learn an erroneous source hierarchy (inverted authority).
- Instability: make outputs sensitive to minor formulations, for lack of stabilization.
- Conditional triggering: provoke a behavior only under certain conditions.