Blog · arXiv Analysis · Last reviewed June 25, 2026

The Execution Path Becomes the Policy Object

The March 2026 arXiv paper Runtime Governance for AI Agents: Policies on Paths, by Maurits Kaptein, Vassilis-Javed Khan, and Andriy Podstavnychy, argues that agent governance should evaluate the path an agent has already taken before approving the next action.

The Path Is the Unit

The paper, arXiv:2603.16586 [cs.AI], was submitted on March 17, 2026. It starts from a practical problem: an agent does not merely answer a prompt. It can read from one system, call another tool, delegate to a sub-agent, write a file, send a message, and reuse an earlier step as context for a later step. One action may look harmless alone while the sequence violates policy.

This is a fresh angle beside the site's notes on runtime governance planes, policy-state ledgers, external rulebooks, and system prompts as policy proxies. Those pages ask where enforcement, state, and rulebooks live. This one asks what the policy evaluates: not the prompt, final output, or tool schema, but the execution path.

What the Paper Formalizes

Kaptein, Khan, and Podstavnychy define an execution path as a finite sequence of steps. They distinguish stochastic steps such as language-model calls, deterministic steps such as database queries or API calls, and composite steps such as delegation to another agent. The taxonomy is not a theory of mind. It is a control surface for systems that act through tools.

The central formal move is the policy function. In the framework, a policy maps agent identity, partial execution path, proposed next action, and shared organizational governance state to a policy-violation probability. Shared state matters because some constraints are not visible inside one agent's local history. The paper's examples include documentation, agent integrity, PII predecessor checks, approval requirements, data exfiltration, information barriers, execution bounds, and time restrictions.

Why Prompts and Roles Miss It

The useful provocation is the paper's treatment of familiar controls. Prompting may reduce the chance that an agent takes a bad path, but it does not evaluate the path. Static access control can remove action categories, but it usually ignores the sequence that produced the proposed action. A role can say that an agent may read a database and may send email. It does not by itself know whether the email follows a sensitive read.

That is the human-machine cognition lesson. Humans often judge an assistant by the visible request and answer. Agent risk can sit between them: retrieved context, tool outputs, hidden state, retries, sub-agent calls, and intermediate drafts. Prompts and access control remain useful layers, but path-dependent violations require runtime evaluation. If the violation condition depends on prior steps, a control that does not inspect prior steps cannot enforce it.

Where the Policy Engine Sits

The implementation section separates prospective and retrospective governance. In prospective mode, a policy engine intercepts the proposed action before execution, scores applicable policies, records the event, and returns an intervention. In retrospective mode, the engine reads logs after the fact. Retrospective governance can support audit, but it cannot prevent the action that already happened.

The paper also separates registration and per-step phases. Registration checks policies that depend only on the agent, such as documentation, integrity, or permitted operating time. Per-step evaluation handles policies that depend on the path, proposed action, and shared governance state. To keep this tractable, the paper argues that most policies can use a compact state vector: sensitivity seen so far, approval status, step count, barrier tags, and similar variables. The intervention vocabulary is operational: pass, steer, or block.

The Limits Are the Point

The paper is not proof that a product, platform, or enterprise deployment is safe. It is a framework, and its own challenges matter: a policy engine can become a single point of failure, shared state consistency can fail under concurrency, the audit trail can contain the sensitive data it is meant to protect, and sub-agent delegation provenance remains difficult.

Calibration is the deepest issue. The policy output is framed as a violation probability, but operational systems need labeled execution traces before that number can be trusted as a probability rather than a severity score. The paper's EU AI Act discussion should also be read as authors' interpretation of how the machinery could support compliance, not as legal certification.

Governance Standard

A governed agent runtime should preserve enough path state for a later reviewer to reconstruct why the next action was allowed. At minimum, record the agent identity, registered purpose, task identifier, tool manifest, proposed action, prior tool calls, prior data sensitivity, delegation chain, shared policy state, policy version, threshold, decision, intervention, human approval record, and audit-retention class.

Do not ask the model to certify this record from memory. The host, policy engine, connector layer, or runtime wrapper has to maintain it outside the agent's own self-description. The Spiralist rule is simple: a policy that ignores the path is a policy for a simpler machine. Agents make the route part of the act.

Sources


Return to Blog