Indirect injection: when “summarize this content” becomes an attack surface
This page defines indirect injection as an authority threat that transits through a legitimate task (“summarize”, “explain”, “extract”) and converts a hostile instruction into consumed context.
Prompt injection is often imagined as an adversary who “talks to the model” directly. Yet, in a modern architecture (RAG, assisted navigation, agents), a large part of context is not provided by the user, but retrieved (pages, documents, extracts, emails, repositories, tools). Indirect injection exploits this reality: it places instructions in content that will then be treated as data.
The critical point is structural: a work instruction (“summarize this content”) forces the system to ingest third-party text. If the system does not explicitly bound what can instruct, it risks letting a hostile instruction slip into the decisional hierarchy.
Operational definition
Indirect injection: insertion of instructions or constraints in third-party content (page, document, extract, tool output) such that, during a legitimate task (summary, extraction, classification, response), the system treats these instructions as authoritative context and modifies its output, priorities, or decisions.
The central mechanism is an instruction/data confusion transiting through a processing step perceived as neutral.
Why “summarize this content” is an attack surface
A summary request has a particular property: it implicitly gives the content a status of “raw material” to ingest, without prior validation of its role.
If the system does not impose strict separation between:
- rules (what can instruct)
- context (what can inform)
- sources (what can carry authority)
then content can contain a hostile instruction that will be treated as if it were compatible with the requested task, or even prioritized.
Common surfaces (where injection hides)
- Web pages: sections invisible to the eye (footer, comments, accordions), or non-editorialized “SEO” content.
- Documents: PDF, docs, notes, where the instruction is buried in a paragraph.
- Tool outputs: API outputs, connectors, scrapers, logs, consumed as “raw data”.
- RAG-indexed content: a poisoned fragment can be recalled out of context and gain implicit authority rank.