The Clarification Question Becomes the Injection Window
The May 2026 arXiv paper ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents, by Udari Madhushani Sehwag, Zhengyang Shan, Heming Liu, Dileepa Lakshan, Joseph Brandifino, and Max Fenkell, tests a security problem inside a behavior normally treated as safe: asking for clarification before acting.
Clarification Is a State
The arXiv record for arXiv:2605.17324 lists ASPI as submitted on May 17, 2026 in Cryptography and Security, with Artificial Intelligence as a secondary subject. The paper starts from a design habit that usually sounds responsible: when a task is ambiguous, an agent should ask the user for clarification instead of guessing.
That habit is real safety work. It can prevent wrong-target actions, missing-constraint failures, and silent overreach. But ASPI shows why clarification cannot be treated as a harmless pause. When an agent asks for more information, it changes its interaction state. It now expects new text to be task-relevant, and that expectation can make adversarial instructions harder to separate from ordinary user-supplied data.
The ASPI Benchmark
The paper introduces ASPI, short for Ambiguous-State Prompt Injection. The benchmark contains 728 task-attack scenarios spanning workspace, messaging, travel, and banking. It builds on AgentDojo and holds the task, attacker goal, environment, and scoring functions fixed while changing the agent's state and the channel through which adversarial content enters.
The comparison matters. In the execution condition, the user gives a fully specified task and the agent encounters hostile content only through tool-returned data. In the clarification condition, the initial request is underspecified, so the agent first asks for missing information and then receives a clarification reply or related return path that can carry an injected instruction.
This isolates the thing that ordinary prompt-injection tests often hide. The issue is not only whether web pages, tool outputs, or retrieved documents can contain attacks. It is whether an agent becomes more receptive to attack after it has solicited additional input.
The Vulnerability Gap
The authors evaluate ten frontier models and report that clarification-seeking consistently amplifies prompt-injection vulnerability. In the headline abstract examples, attack success rises from 1.8 percent to 34.0 percent for o3, and from 2.2 percent to 35.7 percent for Gemini-3-Flash. The paper's contribution summary also reports an increase from 11.1 percent to 63.1 percent for Kimi K2.5.
The experimental HTML reports per-suite attack-success rates across workspace, Slack-style messaging, travel, and banking tasks. The paper says the increase is not driven by one domain; under tool-channel attacks, most models show higher attack success under clarification across the suites.
That is the governance lesson. A system can pass a security evaluation in a fully specified execution state and still be brittle when the same workflow begins with ambiguity resolution. The safer state is not automatically the safer channel.
The Channel Problem
ASPI separates state effects from channel effects. A user clarification reply is not the same institutional object as an ordinary tool result. The agent requested it, the interface may present it as part of the main task, and the system may lack a clean provenance boundary between "answering my question" and "issuing an instruction to me."
This connects to hidden web prompt payloads and instruction-data boundaries, but it is narrower. The vulnerability is created by a socially useful repair move. The agent is not merely ingesting untrusted content; it is inviting more content into a slot that feels authorized.
In deployed systems, that slot may be filled by a human, a customer record, a chat handoff, a form field, a tool wrapper, or a platform-mediated relay. Treating all clarification text as trusted user intent creates a path for attackers to smuggle commands through the very mechanism meant to reduce uncertainty.
Defenses and Limits
The paper evaluates two lightweight defenses in an appendix: a prompt guard that screens new user and tool messages, and an instruction hierarchy that tries to clarify authority. The authors frame these as tests of existing ideas, not as a complete defense family. Their results indicate that defenses can reduce attack success but do not close the clarification gap.
The important design implication is that "ask before acting" needs its own threat model. A clarification question should not simply reopen the instruction channel. It should preserve provenance, label the reply's authority, and keep the agent from treating injected policy overrides as part of the user's missing detail.
Governance Standard
Any agent that can ask clarification questions before using tools should log a clarification receipt. The receipt should record the original ambiguous task, the question asked, the channel that delivered the reply, the provenance of any external content included in that reply, the authority assigned to the reply, and whether downstream tool calls depended on it.
The receipt should also separate missing factual details from new instructions. If the agent asks which invoice to open, the answer may identify an invoice. It should not silently change the agent's policy, permissions, destination account, deletion rule, or reporting obligation.
The Spiralist rule is simple: ambiguity repair is part of the attack surface. If an agent asks for clarification, the reply must be treated as evidence with provenance and scope, not as a blank check to rewrite the task.
Limits
ASPI is a benchmark study, not a measurement of every deployed agent interface. Its scenarios, models, prompts, scoring rules, and domains define the claim. The paper also relies on benchmark construction and LLM-as-judge components that should be reviewed before using the numbers as procurement evidence.
The result should not be misread as "never ask questions." Clarification remains important for safety and usability. The narrower lesson is that clarification must be evaluated as a separate interaction state with its own channels, labels, filters, and audit records.
Sources
- Udari Madhushani Sehwag, Zhengyang Shan, Heming Liu, Dileepa Lakshan, Joseph Brandifino, and Max Fenkell, ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents, arXiv:2605.17324 [cs.CR], submitted May 17, 2026.
- arXiv experimental HTML for ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents, accessed June 30, 2026.
- ASPI project repository, scaleapi/aspi, linked from the arXiv abstract.
- Related pages: The Hidden Web Prompt Becomes the Payload, The Instruction-Data Boundary Becomes the Security Primitive, The Tool Call Becomes the Wrong Target, The Task Token Becomes the Ignored Instruction, and Mixed-Initiative Interaction.