Prompt injection: authority threat and instruction/data confusion

Type: Clarification

Conceptual version: 1.0

Stabilization date: 2026-02-28

This page defines prompt injection as an authority threat and clarifies the structural confusion between instruction and data.

In an interpretive regime, an AI system does not merely “read” content. It aggregates heterogeneous signals (instructions, context, data, retrieved sources) and produces a response as if these elements were compatible. Prompt injection exploits precisely this gray zone: passing an instruction off as data, or having data consumed as if it carried superior authority.

On gautierdorval.com, prompt injection is not treated as a simple “prompt hack”, but as a mechanism of hierarchy reversal in the interpretation chain.

Operational definition

Prompt injection: attempt to have an illegitimate instruction executed, prioritized, or integrated by inserting it into a channel consumed by the model (user input, retrieved content, metadata, tools, memory), in a manner that modifies the system’s output or decision.

The core of the problem is not the existence of an instruction, but its status: it is consumed as if it were authorized, relevant, and of superior rank, when it is not.

Central principle: instruction/data confusion

Data describes. An instruction commands.

Prompt injection seeks to make the system believe that data is an instruction (“ignore previous rules”) or that an instruction is reliable data (“this document proves that…”). This confusion is worsened when the system does not clearly bound:

  • what has the right to instruct (policies, system prompts, runtime rules)
  • what serves as context (retrieval, memory, citations)
  • what is a source of truth (canon, definitions, page hierarchy).

Authority threat

In this framework, injection is an authority threat: it seeks to displace the “source that decides”.

A vulnerable system does not fail because it “misunderstands”, but because it grants an illegitimate authority rank to a fragment.

Minimal typology (common surfaces)

  • Direct injection: instruction is in the user input and aims to override rules.
  • Injection via content: instruction is inserted in retrieved text (page, PDF, comment, doc), then consumed as context.
  • Injection via metadata: hidden instruction in titles, descriptions, structured data, or alt attributes.
  • Injection via tools: output of an external tool containing an instruction treated as data.
  • Injection via memory: persisted instruction in an agentic memory, then recalled as a rule.

Related pages