Blog · arXiv Analysis · Last reviewed June 24, 2026

The Fault Investigator Becomes the Accountability Layer

The June 2026 arXiv paper SAFARI: Scaling Long Horizon Agentic Fault Attribution via Active Investigation, by Chenyang Zhu and twelve coauthors, proposes an investigator for agent traces that exceed ordinary context limits. Its governance lesson is that fault attribution becomes its own accountability layer once agents act through long, multi-step records.

When the Log Outgrows Judgment

An agent failure is rarely one line of text. It is a trace: prompts, tool calls, observations, retrieved files, intermediate plans, rewritten goals, messages between agents, hidden state, retry loops, and final actions. As agent work becomes longer and more collaborative, the trace can become too large for an ordinary reviewer, and even too large for an evaluator model's context window. At that point, accountability cannot be reduced to "read the log."

The SAFARI paper begins there. It says autonomous agents now produce multi-step and multi-agent execution trajectories that can exceed even large context windows. Loading the whole trace into a model creates attention dilution and eventually fails when the trace exceeds the window. That makes failure diagnosis a governance problem: the institution may have records, but no practical way to locate the step where responsibility became actionable.

This is distinct from agent receipts, delegation traces, and incident reports. Those pages ask what records should exist. SAFARI asks how an evaluator can investigate a record that is too large to ingest.

What SAFARI Builds

The paper, arXiv:2606.24626, was submitted on June 23, 2026. The authors are Chenyang Zhu, Jiayu Yao, Kushal Chawla, Youbing Yin, Nathan Wolfe, Pengshan Cai, Jingyu Wu, Spencer Hong, Sangwoo Cho, Shi-Xiong Zhang, Daben Liu, Sambit Sahu, and Erin Babinsky. The arXiv entry says the work was published at the Second Workshop on Agents in the Wild: Safety, Security, and Beyond at ICML 2026.

SAFARI stands for Scaling long-horizon Agentic Fault Attribution via active Investigation. The paper defines the decisive fault as the earliest uncorrected error in an agentic trace. Instead of passively loading the full trajectory, SAFARI uses an Investigator Agent that queries the trace. Its toolset includes read(offset, limit), which traverses segments of the trace, and search(pattern), which runs case-insensitive regex queries over the serialized trajectory.

The system also keeps a Short-Term Memory block for cross-turn reasoning. That matters because an investigation can lose evidence when earlier tool observations are pushed out of the context window. The memory records the underlying task goal, hypotheses and evidence about failure steps, investigative gaps, and past tool calls.

Evidence Before Attribution

The important design move is verification before conclusion. Once the Investigator Agent has gathered enough evidence, it decomposes its hypothesis into up to three atomic claims. Each claim is sent to a Reasoning Evaluator LLM, which checks only that claim against quoted evidence. The evaluators do not see the whole underlying trace. They judge whether the cited evidence supports the stated conclusion and return confidence scores.

For governance, that is the useful pattern. A fault investigator should not merely output a name, step number, or blame label. It should preserve a chain of evidence: what it searched, what it read, what claim it formed, which evidence supported the claim, which claims were weak, and why it stopped. Otherwise "the agent found the fault" becomes the same old accountability fog, now wrapped in a diagnostic interface.

The phrase accountability layer is deliberate. Fault attribution is not the whole incident process. It does not decide legal liability, worker discipline, user remedy, vendor breach, or product recall by itself. But it can make those later processes possible by locating the earliest uncorrected technical error and showing the route used to identify it.

Benchmarks and Limits

The authors evaluate SAFARI on Who&When and TRAIL. The paper reports that Who&When uses traces from Magentic-One and CaptainAgent, derived from GAIA and AssistantBench. TRAIL provides long-horizon agentic trajectories, with some traces exceeding two million tokens and TRAIL/GAIA peaking at 122 steps. The paper uses Claude-Opus-4.6 as its primary backbone model.

The headline results should be read as paper-reported benchmark results, not as proof of universal audit readiness. SAFARI reports 20 percent improvement on Who&When within a one-million-token budget, 19 percent improvement on the TRAIL GAIA subset under a 25,000-token budget, and 0.58 precision when the target fault is five times beyond the model's native context window. It also reports that at a one-million-token budget, SAFARI can perform slightly worse than RAFFLES, partly because the Short-Term Memory introduces a tradeoff between summarization and information loss when the raw log fits in context.

That limitation is valuable. It prevents a new ritual: treating active investigation as magic. Sometimes the best review is direct access to the raw record. Sometimes the better system is an investigator that can move through the record. Governance has to preserve both.

Governance Standard

A serious agent-failure review system needs more than logs. It needs trace schemas, stable step identifiers, tool-call provenance, model and prompt versions, memory snapshots, investigator queries, cited evidence, evaluator confidence, disagreement records, and human review points. The investigation record should be exportable to the incident process, not trapped inside an evaluation dashboard.

It also needs separation between diagnosis and punishment. Early fault localization can support learning, repair, and remedy. It can also become a machine-shaped blame assignment if managers use the first technical fault to ignore procurement pressure, workload design, ambiguous instructions, weak safeguards, or unsafe incentives. The investigator should identify evidence, not become the institution's excuse to stop asking wider questions.

The Spiralist rule is simple: when agents act through long traces, accountability must become investigative. A system that cannot show how it found the fault has not finished explaining the failure.

Sources

Chenyang Zhu, Jiayu Yao, Kushal Chawla, Youbing Yin, Nathan Wolfe, Pengshan Cai, Jingyu Wu, Spencer Hong, Sangwoo Cho, Shi-Xiong Zhang, Daben Liu, Sambit Sahu, and Erin Babinsky, SAFARI: Scaling Long Horizon Agentic Fault Attribution via Active Investigation, arXiv:2606.24626 [cs.AI], submitted June 23, 2026.
arXiv experimental HTML for SAFARI: Scaling Long Horizon Agentic Fault Attribution via Active Investigation, reviewed June 24, 2026.
Related pages: The Agent Log Becomes the Receipt, The Delegation Trace Becomes the Audit Boundary, The Incident Report Becomes Public Memory, The Context Window Becomes the Failure Archive, The First Task Becomes the Safety Gap, and AI Audit Trails.

Return to Blog