Agent Audit and Incident Review
A protocol for making AI agent work reviewable after the run is over. Prompt hardening sets the instruction boundary. Tool permissioning sets the action boundary. Audit practice makes the boundary visible.
The institution should assume that any AI agent with tools will eventually do something surprising. The question is not whether surprise can be eliminated. The question is whether the surprise leaves enough evidence for correction, care, and accountability.
An agent run without a trace is rumor. An agent run with a trace can be reviewed.
The Rule
No consequential agent action is complete until it can be reconstructed.
The record must be good enough for a reviewer to answer:
- what the agent was asked to do;
- what authority it was given;
- what sources it touched;
- what tools it called;
- what it changed;
- what it sent outside the workspace;
- which human approved the consequential step;
- what failed, drifted, or surprised the operator.
If the answer depends on memory, vibes, or a copied chat fragment, the workflow is not mature enough for consequential use.
Minimum Run Record
Every agent-assisted workflow should create a run record.
Record:
| Field | Purpose |
|---|---|
| Run ID | Unique identifier for the agent run |
| Date and operator | Who initiated the run |
| Workflow | Research, drafting, build, support, CRM, archive, media, finance |
| Agent identity | Model, app, account, service account, or vendor |
| Task brief | The original human request |
| Permission class | The class from Agent Tool Permission Protocol |
| Allowed tools | Tools the agent was permitted to use |
| Sources touched | Public URLs, local docs, internal records, restricted records |
| Tool calls | Search, read, edit, send, deploy, database, shell, MCP, connector |
| Human gates | Approvals requested, approved, denied, or skipped |
| Outputs | Drafts, files, messages, summaries, tickets, commits, deployments |
| Exceptions | Errors, refusals, guardrail trips, suspected injection, drift |
| Reviewer | Person responsible for closing the run |
For low-risk public drafting, this can be a small note in the work log. For agent runs with write, send, publish, payment, CRM, archive, or shell access, it should be a durable record.
Trace Requirements
Where the platform supports traces, preserve the structure of the run rather than only the final answer.
Capture:
- prompts and task brief;
- model generations needed to explain decisions;
- tool-call names;
- tool-call inputs and outputs;
- handoffs between agents;
- guardrail checks;
- human approval gates;
- files changed;
- external destinations;
- timestamps;
- error states.
Do not capture more sensitive material than needed. Tracing can itself become a data store. If a trace includes testimony, private contact records, donor records, care-circle notes, minor material, credentials, or incident records, the trace inherits the classification of the most sensitive material it contains.
Default position: traces for public research may contain source text and tool summaries. Traces for restricted workflows should avoid raw content where a reference, hash, record ID, or redacted excerpt is enough.
Redaction Standard
The audit trail should not become a second breach.
Redact or avoid storing:
- credentials, API keys, tokens, recovery codes, and private links;
-
personal phone numbers, email addresses, addresses, and account handles unless required for the review;
-
raw testimony not approved for the reviewer;
- minor names and identifying details;
- donor payment details;
- legal correspondence;
- care-circle notes;
- medical, mental-health, or crisis details beyond the minimum needed for a safety record.
When redaction changes the review value of the record, note that a fuller restricted record exists and identify who may access it.
Incident Triggers
Escalate an agent run to incident review when any of these occur:
-
the agent publishes, sends, deletes, pays, changes permissions, deploys, or modifies a database without the required approval;
-
the agent accesses data outside its permission class;
- untrusted content appears to redirect the agent’s goal;
- a tool call exposes private or internal data to an external destination;
-
the agent installs or enables a plugin, skill, connector, package, extension, or MCP server without review;
-
a trace shows a repeated attempt to bypass rules;
- a guardrail or tripwire fires on a consequential workflow;
- the agent creates false institutional statements;
- the agent invents citations or fabricates source claims in a public piece;
- a member, donor, volunteer, source, or outside party reports harm or surprise;
- the operator cannot explain what happened.
When in doubt, open a small incident note. A small note can be closed. An unlogged failure cannot be repaired.
Incident Review Form
Use this form for agent incidents.
Incident ID:
Date opened:
Reporter:
Workflow:
Agent/system:
Operator:
Permission class:
Tools involved:
Data involved:
External destination:
What happened:
Expected behavior:
Actual behavior:
Human gates present:
Human gates missed:
Immediate containment:
People affected:
Records preserved:
Root cause:
Policy change:
Prompt change:
Permission change:
Tool change:
Reviewer:
Date closed:
Do not turn incident review into blame theater. The purpose is to preserve evidence, contain harm, repair what can be repaired, and lower the chance of repeat failure.
Weekly Agent Review
Each week that agents are used, review a small sample.
Review:
- all agent runs with consequential actions;
- all guardrail trips;
- all denied approval requests;
- any run that touched internal or restricted material;
- any run with unexpected tool use;
- one random low-risk public run.
Ask:
- Did the run stay inside the original task?
- Did every tool call match the allowlist?
- Was the permission class correct?
- Were approvals specific enough?
- Did the trace omit necessary evidence?
- Did the trace store unnecessary sensitive data?
- Did the output cite sources honestly?
- Did the agent claim authority it did not have?
- Did the operator rely on the agent past the verification boundary?
- What should change before the next run?
The review should produce changes to prompts, permissions, tool registers, or training notes. If review produces no changes for months, the review is probably too passive.
Guardrail Feedback Loop
Audit is not an archive of embarrassment. It is a tuning loop.
For each material failure, decide whether the correction belongs in:
- the system prompt;
- the task template;
- the tool allowlist;
- the tool guardrail;
- the human approval gate;
- the data classification policy;
- operator training;
- vendor configuration;
- incident protocol;
- public correction.
Do not fix a permission failure only with better wording. If an agent had a tool it should not have had, remove or narrow the tool. If an agent touched data it should not have touched, change access. If an agent repeatedly drifts when reading untrusted material, add blocking checks before the tool call, not only after the final answer.
Public Correction Rule
When agent-assisted work creates a public error, correct the public record.
Correction notes should state:
- what was wrong;
- when it was corrected;
- whether AI assistance contributed to the error;
- which source or process now supports the corrected claim.
Do not use “AI error” as a way to avoid responsibility. The institution published the work; the institution corrects it.
Retention
Suggested retention:
| Record | Default retention |
|---|---|
| Low-risk public drafting run | 90 days |
| Public research trace used for publication | 1 year |
| Consequential action run | 3 years |
| Finance, donor, legal, or governance agent run | Match the governing record schedule |
| Incident review | Permanent or board-defined archival term |
| Restricted testimony trace | Avoid if possible; otherwise follow testimony consent and privacy policy |
Retention must follow Privacy and Data Stewardship. Traces should not silently outlive the record class they contain.
Spiralism Policy
Spiralism agents with tool access must leave a reviewable run record. Any agent that can publish, deploy, send, delete, modify records, change permissions, make purchases, contact outsiders, or access restricted material must have explicit trace, approval, and incident-review handling before use.
This protocol pairs with:
- Agent Prompt Hardening;
- Agent Tool Permission Protocol;
- Digital Infrastructure and Security;
- Privacy and Data Stewardship;
- Research and Editorial Integrity;
- Incident and Complaint Protocol.
Sources Checked
- OpenAI Agents SDK, Tracing, accessed May 2026.
- OpenAI Agents SDK, Guardrails, accessed May 2026.
- OWASP GenAI Security Project, OWASP GenAI Security Project Releases Top 10 Risks and Mitigations for Agentic AI Security, December 9, 2025.
- NIST, AI Risk Management Framework, accessed May 2026.