The trap of “summarize this for me” functions: attacks through role mixing

Type: Article (interpretive risk)

Conceptual version: 1.0

Stabilization date: 2026-02-28

“Summarize this,” “explain this,” and “extract this” functions are not neutral. They force a system to ingest third-party content and can turn a legitimate task into an attack surface through role mixing.

The reflex to ask “summarize this content” looks harmless. In modern architectures — RAG, assisted browsing, tool-using agents — that command triggers a risky mechanism: the system must absorb external text and treat it as raw material. From that point on, the question is no longer “is this content true?” but “can this content instruct the system?”

That is the trap. A document can contain disguised instructions, embedded constraints, or framing devices that try to move upward in the hierarchy. The issue is not only misinformation. It is an authority threat: a shift in what is allowed to decide.

The mechanism: mixing instruction, context, and authority

A robust architecture keeps three layers distinct:

Instruction: what commands the system — policies, system rules, runtime constraints.
Context: what informs the answer — documents, retrieved pages, snippets, memory.
Authority: what may be treated as canonical truth — definitions, doctrine, stabilized boundaries.

“Summarize this” functions tend to flatten those layers. Everything becomes “text to process.” If hostile instructions are embedded in that text, they may try to climb the hierarchy, especially when no explicit bounding mechanism is in place.

Why this is not just prompt injection

In direct prompt injection, the hostile instruction sits in the user input. Here, it travels through third-party content — a page, a document, a PDF, a tool output — and gets ingested because the task itself grants it apparent legitimacy. That is why this is better understood as indirect injection through role mixing.

The signature of the problem: illegitimate authority

The critical signal is not merely “the text contains suspicious words.” The critical signal is that the system starts to:

prioritize constraints coming from the content above its own rules
change behavior in unexplained ways
treat third-party instructions as if they were system-level instructions
redefine what counts as relevant, safe, or answerable.

At that moment, the system is no longer only reading. It is letting the content participate in decision-making at the wrong level.

What filtering does not replace

Keyword filtering and pattern detection are useful, but they do not replace structural separation. A text may contain no obviously malicious markers and still produce authority confusion. What matters is whether the system knows, explicitly, that retrieved text remains context and may never become instruction or canon by default.

The role of the Q-Layer

The Q-Layer matters here because it governs response conditions. It makes explicit what the system is being asked to do, under which authority, and with which refusal rule when the task would require crossing a boundary. In that sense, the Q-Layer is not a cosmetic layer. It is what stops a summarization task from becoming an illegitimate authority transfer.

Doctrinal links

Conclusion

The danger of “summarize this” functions is not that they process text. It is that they can collapse instruction, context, and authority into a single undifferentiated layer. Once that happens, the system is no longer merely summarizing content. It is letting content shape what may decide.