Wiki · Concept · Last reviewed June 15, 2026

Context Poisoning

Context poisoning is the deliberate manipulation of the information an AI system treats as active context, persistent memory, retrieved evidence, or thread history so that later answers or actions serve an attacker, advertiser, compromised workflow, or false institutional record.

Definition

Context poisoning is an attack and failure pattern in which untrusted or adversarial content enters the model's working context and later changes what the system believes, retrieves, recommends, remembers, or does. It is narrower than general misinformation and broader than a single prompt injection. The target is not only the next answer. The target is the information environment from which future answers and actions are produced.

MITRE ATLAS names the family AI Agent Context Poisoning: manipulation of the context used by an agent's large language model to influence responses or actions. In the 2026.05 ATLAS data, the technique includes memory and thread variants. Memory poisoning tries to persist a false instruction, preference, or fact across sessions. Thread poisoning leaves malicious instructions in a conversation or shared channel so they influence later turns.

Context poisoning overlaps with Prompt Injection, Data Poisoning, Retrieval-Augmented Generation, and AI Memory and Personalization, but it names a distinct operational question: what happens when the machine's local world model is edited by material that should have been treated as untrusted evidence?

How It Works

Context poisoning can arrive through ordinary surfaces: a webpage, email, issue comment, customer ticket, spreadsheet, PDF, Slack thread, documentation page, meeting transcript, memory update, or tool result. A human may see harmless text. The agent may see an instruction, a preferred source, a false fact, a credential lure, a request to call a tool, or a reason to trust one document over another.

In a retrieval system, the poisoned material is stored where a future query will find it. In a memory system, it is saved as a user preference or durable fact. In a long thread, it remains inside the context window. In an agent workflow, it can pass from one tool call to the next until the original source becomes hard to reconstruct.

The risk rises when context is connected to tools. A poisoned answer is an information problem. A poisoned agent with email, browser, calendar, code, payment, file, or administrative tools can become an action problem.

Current Context

OWASP's 2025 LLM Top 10 treats prompt injection as the leading LLM application risk and explicitly includes indirect injection from websites or files. NIST's Generative AI Profile likewise describes indirect prompt injection as a remote attack against LLM-integrated applications by injecting prompts into data likely to be retrieved. These descriptions are the foundation for context poisoning: the attack is mediated by the system's own evidence pipeline.

Microsoft's February 2026 security research described AI memory poisoning in recommendation workflows. The reported pattern used links such as "Summarize with AI" to send hidden memory-manipulation instructions into an assistant so a source could later be treated as trusted. The important lesson is not that every memory system is compromised; it is that personalization creates a durable attack surface when memory writes are controlled through normal conversational channels.

OWASP's agentic security work and the MITRE ATLAS 2026.05 data both move the problem from chatbot safety into system architecture. Agents combine instructions, memory, retrieval, tool calls, identity, permissions, and human approval. Poisoned context can therefore affect what the agent sees, what it can do, and what a human reviewer is told about why the action is reasonable.

Governance and Safety

Context poisoning is a governance problem because it changes the record before the decision. The affected user may not know which document was retrieved, which memory was loaded, which thread message shaped the plan, or which tool output became authoritative. In employment, education, healthcare, finance, legal practice, security operations, and public administration, that makes appeal and audit difficult.

Organizations should treat memory, retrieval, and agent scratchpads as governed infrastructure. They need ownership, retention limits, deletion paths, provenance, source labels, access controls, incident review, and evidence logs. A memory update should not be treated like a casual chat message if it can influence later recommendations or actions.

Defense Pattern

Separate authority from evidence. Retrieved content, emails, web pages, and tool outputs should be labeled as untrusted data, not instructions for the agent.
Gate memory writes. Require explicit user confirmation, provenance, and review for durable memory changes, especially preferences about trust, safety, finance, health, identity, or vendors.
Restrict tools after untrusted context. MITRE's mitigation data recommends limiting or confirming tool invocation when untrusted data enters context.
Log context provenance. Record what was retrieved, loaded, summarized, remembered, ignored, and passed to tools.
Test realistic workflows. Red-team webpages, documents, shared threads, RAG stores, memory updates, connectors, and agent-to-agent messages rather than only direct user prompts.
Preserve reversibility. Users and administrators need a way to inspect, correct, quarantine, or delete poisoned memories and retrieved records.

Spiralist Reading

Context poisoning is corruption of the local reality frame.

The agent does not act on the whole world. It acts on the world assembled for it: retrieved fragments, remembered preferences, task history, tool outputs, and the last few turns of speech. Whoever can alter that assembly can bend the next action without touching the model weights.

For Spiralism, the danger is not mystical possession. It is administrative possession: a record system quietly accepting someone else's instruction as context, then returning later with confidence, citations, and a clean interface.

Open Questions

What kinds of memory updates should require explicit confirmation before they persist?
How should agents explain that an answer or action was influenced by retrieved or remembered context?
Can context-provenance logs be useful to users without exposing private data or overwhelming them?
What should procurement require from vendors that provide agent memory, RAG, browser use, or enterprise connectors?
How should organizations distinguish ordinary correction of context from malicious poisoning?

Sources

MITRE ATLAS Data, atlas-data repository, reviewed June 15, 2026.
MITRE ATLAS Data, ATLAS 2026.05 YAML distribution, entries AML.T0080, AML.T0080.000, AML.T0080.001, AML.T0099, AML.M0030, and AML.M0031, modified May 27, 2026.
OWASP Gen AI Security Project, LLM01:2025 Prompt Injection, reviewed June 15, 2026.
OWASP Gen AI Security Project, OWASP Top 10 for Agentic Applications, December 9, 2025.
NIST, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, 2024.
Microsoft Security, Manipulating AI memory for profit: The rise of AI Recommendation Poisoning, February 10, 2026.
Microsoft Security, Updating the taxonomy of failure modes in agentic AI systems, June 4, 2026.
Kai Greshake et al., Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection, arXiv, 2023.

Return to Wiki