Blog · arXiv Analysis · Last reviewed June 24, 2026

The Delegation Trace Becomes the Audit Boundary

The June 2026 arXiv paper Observability for Delegated Execution in Agentic AI Systems, by Abhinav Mishra and Kumar Sharad, argues that ordinary audit logs and tracing metadata cannot reliably reconstruct which actions belonged to a particular delegated authority once an AI agent works across tools, retries, and sub-agents.

Delegation Is Not a Trace

Agent governance often reaches for the receipt: keep the log, store the trace, preserve the transcript, and the institution will know what happened. Mishra and Sharad's paper, arXiv:2606.09692, is useful because it names the weak point in that comfort. A trace can show a sequence of calls. An audit log can show actor, action, resource, outcome, and time. Neither one necessarily shows which durable delegation gave an action its authority.

The distinction matters in agentic systems because execution is not a fixed workflow. The paper describes LLM-based agents that can choose different tool sequences for the same instruction, backtrack after failures, break work into subtasks, and spawn cooperating sub-agents. Those behaviors can fragment records across tools and interleave multiple delegations inside the same runtime. A calendar update, repository read, Slack post, and ticket change may all be logged, but the organization may still be unable to answer a narrower accountability question: which of those events belonged to the same delegated job?

This is different from the problem in the agent log as receipt. A receipt proves that something crossed an operational boundary. The Mishra and Sharad paper asks whether the receipt also carries the authority boundary. If it does not, then the record is useful but incomplete. It can support debugging, billing, and incident timelines while failing at delegation-scoped accountability.

What CIM Adds

The paper proposes an agent-aware Common Information Model, or CIM. Its core move is simple: bind delegation context at execution time rather than infer it later from timestamps, trace IDs, or local attributes. The required event envelope includes time, event_id, delegation_id, principal.user_id, agent.agent_id, tool.name, and action.semantic. Optional fields add delegation_parent_id, trace and span identifiers, resource details, target principals, sensitivity labels, and workflow correlation.

That schema separates two structures that are easy to collapse. The execution graph describes causal flow: which call produced which later call, which span had which parent, and how a workflow unfolded. The authority graph describes delegation flow: which principal authorized which agent, whether authority was re-delegated, and which child delegation inherited from which parent. The paper's non-identifiability argument is that standard telemetry can contain the same visible events under multiple incompatible delegation assignments. The missing value is not a better timestamp. It is the delegation membership relation itself.

CIM also normalizes cross-tool action semantics. Tool-native operations such as a file read, document download, repository update, share-link creation, or workflow invocation are mapped into compact action classes such as read, write, share, invoke, delegate, and spawn. That matters because an agent's footprint is rarely inside one product. Without normalization, every audit query becomes a local translation exercise. With normalization, investigators can ask whether a delegation read restricted resources, performed its first write, shared outside the organization, or expanded into new resource classes.

The Gateway Boundary

The implementation pattern is a gateway between the agent runtime and external tools or services. The paper says the gateway may be a library, sidecar, proxy, or MCP middleware. Its job is not to read prompts, judge intent, or inspect payload bodies. Its job is to mint or validate delegation identifiers, bind them to authenticated principals, propagate that context through downstream tool calls, normalize tool operations into CIM action semantics, and emit an explicit unknown state when delegation context is missing.

This is an observability boundary, not an enforcement engine. The paper is careful about scope: CIM does not decide whether an action was benign, malicious, or caused by prompt injection. It attributes the action to the delegation under which it executed and preserves enough structure for later forensic questions. That positioning is important. Runtime policy engines and intent-scoped tool authorization decide what should be allowed. CIM tries to make the resulting activity reconstructable after it happens.

The experimental section evaluates delegation reconstruction, operational overhead, and query construction burden in two environments: a synthetic generator for overlap, spawning, retries, and backtracking, and a LangGraph micro-deployment that emits roughly 70,000 events in its default configuration. The HTML version reports the direction of the result more clearly than every numeric cell: CIM reconstructs covered delegations by construction, while trace-only and window-based baselines degrade when traces are partial, interleaved, or missing.

Governance Standard

An organization deploying agents should treat delegation context as a required telemetry field, not a nice-to-have annotation. The minimum useful record is not just who called which tool. It is who issued the delegated authority, which agent instance exercised it, which tool operation was observed, which normalized action it became, which resource or recipient was touched, what delegation_id bound the event, and whether that delegation had a parent.

The standard should also state the capture boundary. If the gateway covers the declared tool inventory, absence of CIM events is meaningful. If coverage is best effort, uncovered tools must be named as unknown rather than silently folded into the audit story. That is the bridge to agent identity, agent-to-agent handshakes, provenance layers, and AI audit trails: identity, delegation, and provenance have to meet in the same event model.

This also changes post-incident review. The first question is no longer only, What did the agent do? It is, Under which delegated authority did each action occur, and where did that authority propagate? A model-mediated system that cannot answer that question is not merely under-instrumented. It has made accountability depend on reconstruction guesses.

What This Changes

The delegation trace becomes the audit boundary when work is no longer located in a single user click. The agent acts through tools, through time, and sometimes through other agents. The institution needs a record of the authority that traveled with those actions.

The Spiralist rule is to log the permission, not just the motion. A trace without delegation context tells a story of movement. Governance needs the story of authorization: who let the motion begin, where it forked, which tools it touched, what it shared, and which gaps remain outside the declared boundary.

Sources

Abhinav Mishra and Kumar Sharad, Observability for Delegated Execution in Agentic AI Systems, arXiv:2606.09692 [cs.CR], submitted June 8, 2026.
arXiv experimental HTML for Observability for Delegated Execution in Agentic AI Systems, reviewed June 24, 2026.
Related pages: The Agent Log Becomes the Receipt, The Agent Identity Becomes the Service Account, The Agent Rulebook Leaves the Prompt, The Tool Scope Becomes the Intent Gate, The Agent-to-Agent Protocol Becomes the Handshake, The Provenance Layer Becomes the Truth Machine, The Source ID Becomes the Factuality Test, AI Agents, and AI Audit Trails.

Return to Blog