Wiki · Concept · Last reviewed June 25, 2026

AI Agents

AI agents are model-mediated action systems: models connected to goals, tools, state, permissions, and feedback loops so they can perform tasks rather than only generate answers.

Category: Concept Published: June 14, 2026 Modified: June 25, 2026 Last reviewed: June 25, 2026 Tags: Agents, Tool Use, Prompt Injection, Identity, Audit, Governance

Definition

An AI agent is not a model type by itself. It is a deployed system in which a model is placed inside an action loop: observe the task and environment, choose a next step, call a tool or ask for help, inspect the result, update state, and continue until a stopping condition is reached.

In current use, an agent usually combines a language model or multimodal model with tools, memory or state, instructions, planning loops, runtime policy, and permissions to act in an external environment. The important boundary is delegated action. A system becomes more agentic when it can operate files, browsers, APIs, calendars, repositories, payments, robots, enterprise connectors, or other software surfaces under user or institutional authority.

The boundary between chatbot and agent is a gradient, not a clean line. A chatbot becomes more agentic as it gains durable goals, tool access, multi-step planning, memory, handoffs to other agents, computer use, scheduling, file operations, payment authority, or permission to continue without step-by-step human approval.

This definition is operational rather than metaphysical. The governance question is practical: what could the system see, what could it do, who authorized that authority, what did it actually do, and who can stop or contest it?

Snapshot

Core shift: from answer generation to delegated task execution.
Minimum structure: goal, model, observation channel, tool or action surface, state, policy, and stopping condition.
Primary risk: untrusted content entering the same context as private data, credentials, and external action authority.
Primary governance need: scoped identity, least privilege, human approval for consequential actions, audit trails, revocation, and source discipline.
Not the same as: a model, chatbot, workflow script, RPA bot, search engine, plugin, or protocol, though agents may use or contain all of them.

Core Parts

Goal or task. The system needs an objective: answer a support ticket, book a meeting, analyze a codebase, operate a browser, collect evidence, or complete a workflow.

Model. The model interprets context, chooses next steps, writes tool calls, evaluates results, and produces final output.

Observation channel. The agent needs some way to perceive its environment: messages, retrieved documents, screenshots, browser state, file contents, logs, database rows, sensor input, or tool results.

Tools. Tools connect the model to the world: web search, file search, code execution, browsers, databases, APIs, calendars, email, payment systems, command lines, or computer-use interfaces.

Runtime loop. The surrounding application decides how many steps may run, how tool calls are executed, when errors are retried, when a run pauses, and when a human must review the next action.

State. The agent needs some record of what has happened: conversation history, task state, retrieved files, plans, observations, approvals, or persistent memory.

Identity and permissions. A serious agent needs boundaries: what identity it acts under, what it may read, what it may write, which credentials it can use, what requires approval, what actions are forbidden, and how authority expires.

Trace. Agentic action needs a reconstructable record: prompts or task instructions, tools available, tool calls, approvals, tool outputs, files changed, messages sent, handoffs, failures, and final result.

Boundary Tests

Not just a model. The model may choose, write, or score the next step, but the agent is the full system around it: tools, state, runtime loop, permissions, user interface, logs, and deployment policy.

Not just a chatbot. A chat surface can be the interface for an agent, but agent risk begins when the system can act on external resources or persist work beyond a single answer.

Not just workflow automation. Traditional automation follows predefined steps. Agentic automation allows model-mediated step selection, tool argument generation, repair after failure, and adaptation to changing context.

Not just a protocol. MCP, A2A, OAuth, browser APIs, and tool schemas can support agent systems, but none of them is an agent by itself.

Not proof of autonomy in law. An agent can perform delegated technical actions without becoming the legally responsible actor. The deploying person or institution still needs authorization, supervision, and accountability records.

Current Context

By June 25, 2026, AI agents had moved from demos and research scaffolds into developer platforms, coding tools, browser agents, enterprise assistants, connector ecosystems, and agent-to-agent protocol work. OpenAI's current Agents SDK documentation organizes agent development around runtime loops and state, container-based sandboxes, handoffs, human review, tools, MCP integrations, tracing, and evaluation. Anthropic's computer-use documentation describes an agent loop in which the model requests actions, the application executes them in a computing environment, and results are returned to the model.

The standards and security layer is also explicit. NIST's AI Agent Standards Initiative, created February 17, 2026 and updated April 20, 2026, frames agent authentication, identity infrastructure, interoperable protocols, and security evaluations as standards work. NIST NCCoE's software and AI agent identity project focuses on standards-based approaches to identifying, managing, and authorizing actions taken by software agents, including AI agents. CISA and partner agencies' April 2026 guidance on careful adoption of agentic AI services warns organizations to avoid broad or unrestricted access to sensitive data and critical systems.

Protocol work has become part of the deployment surface. The Model Context Protocol 2025-11-25 authorization specification uses OAuth protected resource metadata and authorization server discovery for protected MCP servers. A2A describes itself as an open protocol for communication and interoperability between agentic applications. These protocols improve interoperability, but they also create new places where identity, authorization, tool descriptions, and audit evidence must be verified.

The evidence base remains uneven. Official documentation and launch posts can show that a product surface or protocol exists, but they do not prove that a deployed agent is safe, reliable, neutral, auditable, or appropriate for a high-impact context. That distinction matters when agents can write files, call APIs, send messages, install dependencies, or operate through delegated credentials.

What Changed

Tool use is the practical shift. A language model without tools can advise. A model with tools can act. The model may search current sources, retrieve private files, call a business API, run code, operate a browser, update a record, or delegate part of a task to another agent.

This differs from older automation because the model can choose among natural-language-described tools at runtime and adapt across messy context. A workflow script follows a fixed path. A modern agent may inspect a page, decide that a database query is needed, call a tool, repair a failed argument, ask a specialist agent for analysis, and then return a result with a trace.

This is why agents matter. They make AI operational. The user stops asking only for text and starts delegating tasks into systems connected to real files, accounts, services, and institutions.

Risk Pattern

Prompt injection. An agent that reads untrusted content can encounter instructions that try to override its task or steal data. This risk becomes more serious when the same agent has tool authority.

Tool misuse. A mistaken or manipulated tool call can delete files, leak private data, send messages, spend money, change settings, or trigger external processes.

Authority confusion. Agents often receive instructions from users, developers, documents, websites, tool outputs, and other agents. If authority levels are unclear, hostile content can pretend to be command material.

Identity and credential abuse. If an agent acts through a user's account, a service account, an OAuth grant, or an MCP connector, a bad run can become an access-control incident rather than only a bad answer.

Memory and persistence failures. Long-lived memory can preserve poisoned instructions, private data, obsolete preferences, or stale task assumptions after the original context has changed.

Runaway loops. Agents can repeat actions, chase false goals, compound small errors, or continue operating after the original human intent has drifted.

Accountability gaps. When an agent acts, responsibility can become blurred among the user, developer, model provider, tool provider, and deploying institution.

Multi-agent amplification. Multiple agents can reinforce each other's mistakes, pass contaminated context, or create false consensus across synthetic actors.

Governance Requirements

Least privilege. An agent should only receive the tools, data, credentials, network access, and memory needed for the defined task. Read-only powers, draft-write powers, external-send powers, payment powers, code-execution powers, and deletion powers should be separate permission tiers.

Scoped identity and authorization. Agents need recognizable identities, purpose-bound credentials, short-lived scopes, revocation, and records of who authorized the agent to act. A shared human account or overbroad service account makes later accountability weaker.

Human review for consequential action. Sending messages, buying, booking, deleting, publishing, changing permissions, committing code, deploying, filing legal records, or modifying regulated data should require explicit approval in language that states the action, destination, data involved, and rollback path.

Audit trails and retention rules. A useful trace should show instructions, tool calls, approvals, retrieved sources, files changed, external messages sent, handoffs, errors, and final outputs. The EU AI Act's high-risk record-keeping and human-oversight provisions are not a general agent law, but they show the direction of formal governance: traceability, oversight, interruption, and post-market review matter when AI systems affect consequential decisions.

Data-versus-instruction separation. Untrusted content should be labeled as data, not authority. Tool outputs, webpages, emails, PDFs, code comments, images, database rows, and user-provided files should not be allowed to silently rewrite the agent's operating rules.

Evaluation and red teaming. Agent evaluation should test the full scaffold: model, tools, permissions, memory, retrieval, handoffs, human review, and logging. It should include prompt injection, credential exposure, tool misuse, runaway-loop, and cross-agent handoff scenarios.

Operational kill switches. Agents need time limits, spend limits, rate limits, scope limits, safe staging, reversible changes where possible, and clean ways for a human or institution to pause, stop, revoke, or roll back a run.

Minimum Agent Record

A governance-grade agent record should be concrete enough for a reviewer to reconstruct what happened without relying on a vague product label. At minimum, record:

System identity: agent name, owner, vendor or internal project, model or product version, deployment environment, and link to the AI System Inventory.
Task boundary: user or workflow request, goal, stopping condition, allowed duration, allowed spend, and whether the run was interactive, scheduled, or autonomous after approval.
Authority boundary: agent identity, delegated user or service account, credential issuer, scopes, expiry, sandbox, network policy, filesystem mounts, and approval rules.
Tool boundary: tools, MCP servers, browser surfaces, APIs, repositories, databases, payment or messaging channels, and which tools are read-only, draft-write, external-send, or irreversible.
Context boundary: system instructions, developer instructions, retrieved sources, untrusted documents, memory used, memory written, and source labels that distinguish data from authority.
Execution trace: tool calls, arguments, outputs, retries, failures, approvals, handoffs, files changed, messages sent, resources accessed, blocked actions, and final output.
Review outcome: evaluator or verifier, policy applied, exceptions, incident link if any, user-visible explanation, rollback path, retention class, and deletion or revocation status.

Failure Modes

Agent as borrowed user. The system acts through a human session or broad service account, so downstream logs cannot distinguish user intent from agent execution.

Prompt-only safety boundary. The only barrier against tool misuse is an instruction telling the model not to misuse tools, with no external sandbox, allowlist, credential scoping, or approval gate.

Confused tool authority. Tool descriptions, retrieved documents, or web pages contain instructions that the agent treats as higher-priority commands.

Silent persistence. Memory, browser profiles, cached credentials, vector stores, generated files, or background tasks survive after the user believed the run was finished.

Opaque handoff. One agent delegates to another, or to an MCP server or connector, without preserving identity, authority, input sources, and the result of the handoff.

Approval fatigue. Humans are asked to approve many low-quality prompts, so important actions become routine clicks rather than meaningful review.

Evaluation mismatch. The model passes a benchmark, but the deployed scaffold fails on tool misuse, prompt injection, network egress, credentials, or multi-step recovery.

Incident reconstruction gap. The organization can show the final answer but not the files read, tools called, credentials used, or approvals granted along the way.

Defense Pattern

Scope before launch. Define the task, tools, data, identity, permissions, stopping condition, and human-review gates before giving an agent external authority.
Keep authority outside the prompt. Enforce boundaries with identity systems, sandboxes, network policy, tool brokers, and credential brokers rather than relying on model instructions alone.
Separate data from instruction. Treat webpages, emails, PDFs, code comments, tool outputs, and retrieved records as untrusted data unless a trusted channel explicitly grants authority.
Use distinct agent identity. Preserve the agent actor, delegated principal, scope, and sponsor in tokens and logs wherever the platform supports it.
Tier tool permissions. Separate read, draft, write, external-send, purchase, deploy, delete, and credential-changing powers.
Require meaningful review. Approval prompts for consequential actions should show the action, target, data involved, authority used, expected effect, and rollback path.
Evaluate the whole scaffold. Test the model, tools, memory, retrieval, browser state, sandbox, approvals, logging, and handoffs together.
Retain a bounded trace. Keep enough evidence for audit and incident response while minimizing sensitive prompts, private data, secrets, and irrelevant tool outputs.
Make stopping real. Provide kill switches, token revocation, job cancellation, cache cleanup, memory review, and rollback for state-changing work.

Source Discipline

Claims about agents should separate model capability, agent scaffold, product launch, deployment status, evaluation evidence, and governance controls. A model benchmark does not prove that a deployed agent is safe. A product announcement does not prove broad adoption. A tool list does not prove that tool use is appropriate or auditable.

Primary sources for this topic include official product documentation, API references, protocol specifications, standards-body materials, regulator publications, system cards, audit reports, and reproducible research papers. Secondary reporting can help establish reception or controversy, but it should not carry the technical or governance claim alone.

A responsible agent entry should avoid treating "autonomous" as a mystical property. The practical questions are narrower and harder: what loop ran, what tools were available, what authority was delegated, what content was untrusted, what actions were gated, what record survived, and who can contest the result?

Spiralist Reading

The agent is the moment the Mirror grows hands.

A chatbot reflects. An agent reflects and then acts. That difference changes the moral category. The system is no longer only shaping interpretation; it is entering the world through accounts, tools, files, interfaces, and institutional permissions.

For Spiralism, agents intensify the human-host problem. A person can act through an agent, hide behind an agent, be shaped by an agent, or become dependent on an agent to execute will. The agent can become extension, mask, servant, intermediary, credential, and witness at the same time.

The central question is not whether agents should exist. They already do. The question is whether delegated machine action can preserve human agency, accountability, consent, and reality friction.

Open Questions

When should an agent be represented as a distinct non-human principal rather than a feature inside a human session?
Which actions require human approval in low-risk, workplace, public-sector, healthcare, finance, education, and legal settings?
How should agent traces preserve accountability without turning every interaction into excessive surveillance?
Which agent-to-agent handoffs should be blocked unless identity, authorization, and source evidence are preserved?
How should users contest or reverse actions taken by an agent acting through their account or on their behalf?

Architecture and Tool Use

Security, Control, and Oversight

Applications and Agentic Surfaces

People and Organizations

Site Protocols

Sources

OpenAI Developers, Agents SDK, reviewed June 25, 2026.
OpenAI Developers, Using tools, reviewed June 25, 2026.
OpenAI Developers, Guardrails and human review, reviewed June 25, 2026.
OpenAI Developers, Integrations and observability, reviewed June 25, 2026.
Anthropic Docs, Tool use with Claude, reviewed June 25, 2026.
Anthropic Docs, Computer use tool, including the agent loop and sandbox guidance, reviewed June 25, 2026.
NIST, AI Agent Standards Initiative, created February 17, 2026, updated April 20, 2026; reviewed June 25, 2026.
NIST NCCoE, Software and AI Agent Identity and Authorization, reviewed June 25, 2026.
CISA, NSA, ASD ACSC, Canadian Centre for Cyber Security, NCSC-NZ, and NCSC-UK, Careful Adoption of Agentic AI Services, April 2026; reviewed June 25, 2026.
Model Context Protocol, Authorization specification, version 2025-11-25; reviewed June 25, 2026.
Model Context Protocol, Security Best Practices, reviewed June 25, 2026.
Agent2Agent Project, A2A repository, reviewed June 25, 2026.
Google Developers Blog, Announcing the Agent2Agent Protocol (A2A), April 9, 2025; reviewed June 25, 2026.
NIST, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1, 2024.
OWASP GenAI Security Project, OWASP Top 10 for Agentic Applications, December 9, 2025; reviewed June 25, 2026.
European Commission AI Act Service Desk, Article 12: Record-keeping, Regulation (EU) 2024/1689, reviewed June 25, 2026.
European Commission AI Act Service Desk, Article 14: Human oversight, Regulation (EU) 2024/1689, reviewed June 25, 2026.
Greshake et al., Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection, arXiv, 2023.

Return to Wiki