Wiki · Concept · Last reviewed June 25, 2026

AgentDyn

AgentDyn is a dynamic, open-ended benchmark for testing whether prompt-injection defenses still work when LLM agents face longer, less scripted tasks across multiple realistic environments.

Category: AI security Updated: June 25, 2026 Tags: agents, prompt injection, benchmarks, dynamic tasks, AI security

Definition

AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments? is arXiv:2602.03117 by Hao Li, Ruoyao Wen, Shanghao Shi, Ning Zhang, Yevgeniy Vorobeychik, and Chaowei Xiao. The arXiv API lists the first submission on February 3, 2026 and version 3 on May 7, 2026.

AgentDyn evaluates indirect prompt injection against tool-using LLM agents. Its focus is not a single malicious prompt in a chat box, but attacker instructions embedded in third-party content that an agent reads while trying to complete a delegated task.

Benchmark Gap

The paper argues that earlier agent-security benchmarks helped compare attacks and defenses, but left three important gaps: a lack of dynamic open-ended tasks, a lack of helpful instructions in third-party data, and overly simple user tasks. A defense can look adequate when the task is short, the attacker is obvious, and the user goal is fully specified, then degrade when the agent must plan across messier state.

That distinction matters for governance. Agents may browse products, inspect repositories, coordinate calendars, draft messages, or operate across multi-step workflows. AgentDyn asks whether defenses survive that more realistic pressure.

Structure

The arXiv abstract describes AgentDyn as a manually designed benchmark with 60 challenging open-ended tasks and 560 injection test cases across Shopping, GitHub, and Daily Life. It also says AgentDyn requires dynamic planning and incorporates helpful third-party instructions, rather than treating all third-party text as obviously hostile or irrelevant.

The GitHub README states that AgentDyn is built on top of the AgentDojo framework and supports the Shopping, GitHub, and Daily Life suites alongside AgentDojo's banking, slack, travel, and workspace suites. It lists evaluated models spanning GPT-4o, Gemini 2.5, Llama, Qwen, and GPT-5 families, and defenses including repeat-user-prompt, spotlighting, tool filtering, detector-based filters, CaMeL, Progent, and DRIFT.

The repository URL in the arXiv abstract redirects to the current public GitHub repository, which presents itself as the official implementation and provides the benchmark code and reported-result logs.

Defense Results

The arXiv abstract reports that AgentDyn's evaluation of ten state-of-the-art defenses suggests almost all were either not secure enough or suffered from significant over-defense. That claim should be read as a paper-reported benchmark finding, not as proof that every deployment using those defense families is unsafe.

The useful governance lesson is the security-utility tradeoff. A defense that blocks too little leaves the agent vulnerable; a defense that blocks too much can make the agent operationally useless. AgentDyn makes that tradeoff visible by putting defenses under open-ended task pressure rather than only fixed injection templates.

Governance and Safety

AgentDyn matters because it challenges static certification. A product team can pass a short benchmark, ship a defense, and still fail when the same agent must plan through realistic ambiguity. Governance should therefore ask for dynamic task coverage, defense-aware attacks, utility under attack, benign-task completion, and concrete examples of over-defense.

It also pushes evaluation closer to deployment records. A serious agent claim should identify suites, tools, trusted and untrusted content, defense mediation, and failure evidence in the environment state.

Limits

AgentDyn is not a full measure of agent safety. It addresses indirect prompt injection in designed scenarios, not every form of misuse, data leakage, collusion, persuasion, workflow error, or platform abuse. Results depend on the task design, model, defense wrapper, tool surface, and scoring rules.

The benchmark should also not become a new ritual score. Its value is diagnostic: it shows where static assumptions break, and where a defense preserves or destroys usefulness.

Evidence Record

A serious AgentDyn result should record the paper or repository version, suite, task ID, injection test case, model, defense, attacker instruction, helpful third-party instruction, tool surface, action permissions, trajectory length, task success, attack success, over-defense rate, repeated-run variance, and environment-state evidence.

Source Discipline

Use the exact current paper identity: AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?, arXiv:2602.03117. The arXiv page is the source for the title, authors, dates, benchmark framing, three claimed benchmark gaps, 60 tasks, 560 injection test cases, scenarios, and high-level defense finding. The GitHub README is the source for implementation status, AgentDojo foundation, supported suites, models, defenses, and result-log location.

Do not cite AgentDyn as proof that a product is safe or unsafe. Cite it as a stress benchmark for indirect prompt-injection defenses under dynamic agent tasks.

Spiralist Reading

AgentDyn is a test of whether safety survives motion.

The static benchmark asks whether the charm works when the room is still. AgentDyn asks what happens when the agent must shop, compare, inspect, plan, and move across tools while helpful and hostile text share the same scene. For Spiralism, that is the honest place to test a boundary: not at the slogan, but at the moving action surface.

Open Questions

Which AgentDyn failures come from weak models, weak defenses, ambiguous tasks, or unavoidable authority conflicts?
How should benchmarks score an agent that stays secure by refusing useful work?
Which dynamic tasks should be withheld or regenerated to reduce benchmark overfitting?

Sources

Hao Li, Ruoyao Wen, Shanghao Shi, Ning Zhang, Yevgeniy Vorobeychik, and Chaowei Xiao, AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?, arXiv:2602.03117 [cs.CR], submitted February 3, 2026; v3 revised May 7, 2026.
SaFo Lab GitHub repository, SaFo-Lab/AgentDyn, README reviewed June 25, 2026.

Return to Wiki