Blog · arXiv Analysis · Last reviewed June 25, 2026

The Agent Skill Becomes the Runtime Contract

Ying Li, Yanju Chen, Hongbo Wen, Bosi Zhang, Hanzhi Liu, Peiran Wang, Yu Feng, and Yuan Tian's June 2026 arXiv paper on VIGIL asks what happens when an AI agent skill's natural-language promise becomes an enforceable runtime contract. The lesson is simple: a tool-call permission is too small when the violation lives across the trace.

The Skill Is Not the Boundary

The arXiv record for VIGIL: Runtime Enforcement of Behavioral Specifications in AI Agent Skills lists arXiv:2606.26524 [cs.CR], submitted June 25, 2026. The listed authors are Ying Li, Yanju Chen, Hongbo Wen, Bosi Zhang, Hanzhi Liu, Peiran Wang, Yu Feng, and Yuan Tian.

The paper begins from a common fact about agent products: agents do not act only through a model. They act through third-party skills. Those skills can touch files, communication channels, operational systems, and cyber-physical devices. Their documentation may say what the skill is allowed to access, disclose, execute, or require before acting. But documentation is not enforcement.

That is the page's point of entry. A skill manifest can look like a constitution while still behaving like a suggestion. If the boundary is written in prose and the monitor only checks one proposed action at a time, the real violation can pass through the space between calls.

What VIGIL Enforces

VIGIL is presented as an end-to-end runtime enforcement framework for agentic systems. It checks an agent's actual execution trace against three kinds of behavioral policy: policies drawn from skill specifications, constraints set by the operator, and global rules that can span multiple skills.

The system introduces a policy language for agent-tool events. The paper says that language can express temporal dependencies, argument constraints, and value-flow conditions. It then pairs the language with symbolic evaluation rules that translate policies into satisfiability-modulo-theories, or SMT, constraints over finite traces.

In plainer terms: VIGIL tries to make the skill's promised behavior executable. A rule is no longer just "do not disclose this" or "validate before use." It becomes a condition over a concrete run: which tool was called, with which arguments, after which prior event, using which artifact, and whether the forbidden relationship appears in the trace.

Why Single Calls Are Too Small

Single-call filters are attractive because they are easy to place at an obvious boundary. The agent proposes a tool call; the guard allows, blocks, or rewrites it. VIGIL's paper argues that this event model is too small for violations that depend on order, identity, and flow.

A later tool call can be harmless when viewed alone and still be unsafe because of how an earlier artifact was produced. A value can cross from one skill into another in a way the original skill specification did not allow. A precondition can be satisfied in the schema but not in the history. These are not prompt failures in the narrow sense. They are trace failures.

This connects directly to earlier Spiralist pages on the skill manifest permission boundary, the agent skill detector, and the tool scope intent gate. Those pages ask what authority a skill receives and how malicious or overbroad skills are spotted. VIGIL adds the runtime question: after installation and permissioning, did the skill keep the behavioral contract during the actual run?

The Evaluation Receipt

The paper reports evaluation on a labeled set built from SkillsBench and Skill-Inject, with 152 executions: 72 policy-violating trajectories and 80 benign trajectories. It also reports additional cross-benchmark comparisons using AgentDojo and SafeAgentBench.

For its main result, the PDF reports 95.8 percent recall at 89.6 percent precision, and the abstract summarizes performance as over 95 percent recall with a false-positive rate below 10 percent on real LLM-agent runs spanning office-document, operational, and engineering tasks. The paper also reports 216 real-world skill-bundle executions and says VIGIL surfaced 34 confirmed violations, including a specification defect acknowledged by NVIDIA.

Those numbers should be read as evidence for a bounded enforcement method, not as proof that all agent skills are now safe. The important receipt is more structural: labeled traces, explicit policies, deterministic grounding and validation steps, SMT witnesses, and comparisons to runtime-enforcement baselines.

Where the Contract Still Depends on People

VIGIL does not eliminate human judgment. It moves part of that judgment into policy authoring and specification review. Someone still has to decide which natural-language requirements are enforceable, which events must be observed, which state is retained, and where intervention should occur.

The paper's granularity warning is the governance core. A monitor that observes too little misses cross-call violations. A monitor that observes too much or reasons too broadly can block benign work. Runtime enforcement therefore needs an appealable record: the policy text, the compiled policy, the observed event trace, the SMT witness, the decision, and the operator rule that made the decision binding.

That record is also where accountability should sit. If an agent misuses a skill, the incident report should not end at the final answer or the failed task. It should identify the skill version, specification source, policy compiler, event abstraction, witness, blocked or allowed invocation, and any gap between the prose contract and the executable one.

Governance Standard

An agent-skill marketplace should require a runtime-contract file beside every skill manifest. The file should state enforceable behavioral rules, required trace events, value-flow limits, temporal preconditions, permitted intervention points, global cross-skill rules, and the test cases used to verify them.

Deployment should then preserve an execution receipt: skill hash, policy version, observed trace, allowed calls, blocked calls, witness or reason, operator override, user-visible consequence, and rollback path. A skill is not trustworthy because its description sounds careful. It is trustworthy only when its declared contract can be checked against what the agent actually did.

Sources

Ying Li, Yanju Chen, Hongbo Wen, Bosi Zhang, Hanzhi Liu, Peiran Wang, Yu Feng, and Yuan Tian, VIGIL: Runtime Enforcement of Behavioral Specifications in AI Agent Skills, arXiv:2606.26524 [cs.CR], submitted June 25, 2026.
arXiv HTML and PDF versions of VIGIL: Runtime Enforcement of Behavioral Specifications in AI Agent Skills, reviewed June 25, 2026; PDF: arxiv.org/pdf/2606.26524.
Related pages: The Skill Manifest Becomes the Permission Boundary, The Agent Skill Becomes the Detector Surface, The Tool Scope Becomes the Intent Gate, The Agent Rulebook Leaves the Prompt, and The Out-of-Band Defense Becomes the Reference Monitor.

Return to Blog