AI poisoning: definition and taxonomy

Type: Clarification

Conceptual version: 1.0

Stabilization date: 2026-02-28

This page defines “AI poisoning” operationally and proposes a readable taxonomy, to avoid confusions, semantic shifts, and improper analogies.

In AI systems, “poisoning” is often used as a catch-all term, sometimes to designate training corpus poisoning, sometimes injection of elements into a RAG base, sometimes corruption of agentic memory. This ambiguity favors implicit interpretations and erroneous diagnoses.

On gautierdorval.com, the term “AI poisoning” is treated as a concept of intentional or instrumentalized corruption of an authority source in a system’s interpretation chain. It is not a rhetorical effect, nor simple “disinformation”, but an action that aims to degrade, bias, divert, or destabilize response production.

Status of this page

This page is an interpretive clarification.

It aims to stabilize internal usage of the term, establish reading bounds, and provide a functional taxonomy. It does not normalize external vocabulary and does not claim to cover the entirety of security research.

Operational definition

AI poisoning: deliberate (or made deliberable) alteration of a data flow, knowledge base, or memory mechanism, so as to produce a systematic drift in AI system outputs, whether through bias, degradation, deviation, or instability.

A poisoning is recognized by a central property: it targets a source consumed as authority by the system (training, index, retrieval, memory, rules, tools, prompts, pipeline), and not only content exposed to humans.

Functional taxonomy

This taxonomy classifies poisoning according to where the alteration occurs and what type of effect is sought.

1) By alteration surface (where it happens)

Training poisoning: alteration of a dataset used to tune a model or learning component.
Retrieval poisoning (RAG): alteration of an indexed base, internal search engine, graph, or corpus serving as passage retrieval.
Agentic memory poisoning: alteration of a state storage (episodic, semantic, procedural memory) to influence an agent’s future decisions.
Pipeline poisoning: alteration of an upstream stage (ETL, scraping, normalization, deduplication, scoring, filters) that modifies consumed truth.
Instruction poisoning: alteration of an instruction system, policies, templates, or tools (prompts, rules, functions) that orient interpretation.

2) By effect type

Directional bias: favor an interpretation, attribution, or narrative.
Degradation: introduce noise, contradictions, or conceptual confusion.
Reference derivation: make the system learn an erroneous source hierarchy (inverted authority).
Instability: make outputs sensitive to minor formulations, for lack of stabilization.
Conditional triggering: provoke a behavior only under certain conditions.