Capability-Based Security
Capability-based security is an access-control model where authority travels through explicit, unforgeable references rather than ambient identity, making delegated agent actions easier to scope, inspect, and revoke.
Definition
Capability-based security is the security model in which possessing a protected reference confers the authority to use the object, service, or operation designated by that reference. A capability is not just a name. It is a name-like handle plus permission, protected so that untrusted code cannot forge it by guessing a string or constructing a path.
The model is contrasted with ambient authority. In an ambient-authority system, a program names a resource and the platform consults a surrounding identity, session, role, or process privilege. In a capability system, the caller presents the specific authority needed for the specific object. Mark S. Miller, Ka-Ping Yee, and Jonathan Shapiro's 2003 paper Capability Myths Demolished describes properties such as no ambient authority, no designation without authority, and support for least-privilege operation.
This entry is about security authority, not model performance capability. For AI agents, tool access is often delegated through handles, tokens, browser sessions, connectors, and servers. The question is which concrete authority is available when an external action is taken.
How It Works
A capability can be a protected pointer, object reference, file descriptor, narrow service handle, or token whose audience, scope, and lifetime are constrained by protocol and policy. The form differs, but the discipline is consistent: authority should be explicit, hard to forge, narrow enough for the task, and passable only through authorized channels.
Capability systems reduce two hazards. They avoid treating a global name as authority, and they make delegation visible because passing the handle is the moment of granting power. A parent can create a restricted child, pass a read-only view, wrap an existing authority with a smaller interface, or expire a grant after a task.
Norm Hardy's 1988 confused-deputy paper explains the motivation. A compiler could write a privileged statistics file and also accepted a user-supplied output filename. When a user supplied the protected billing-file name, the operating system saw the compiler's authority and allowed the write. Hardy's point was that naming the file and authorizing the write had been separated.
Agent Context
Agent systems recreate old capability questions with newer nouns. A browser agent may hold a user's web session while reading untrusted pages. A coding agent may hold repository write access while reading issue comments. A support agent may hold CRM update authority while reading customer messages.
Capability-shaped design gives the agent the narrow handle needed for the next step, not a general account context that silently follows every tool call. A tool handle might permit "read this ticket," "draft a reply," or "append a note" without also granting sending, exporting, or billing authority.
The Model Context Protocol makes the issue concrete. Its 2025-11-25 authorization specification requires OAuth 2.0 Resource Indicators for HTTP authorization flows so a client can identify the target MCP server when requesting a token. The same specification says an MCP server must not pass through a token it received from an MCP client. MCP's security best-practices page treats confused-deputy attacks and token passthrough as risks for proxy servers and downstream APIs.
Governance and Safety
Governance should preserve the difference between identity, instruction, evidence, tool argument, grant, approval, and external effect. NIST defines least privilege as giving users or processes only the access needed for assigned tasks. Capability-based design turns that principle into an interface question: what handle was issued, who could pass it, what did it authorize, how long did it last, and where was it accepted?
OAuth mechanisms are not pure object capabilities, but some pursue similar boundaries. RFC 8707 defines a resource parameter that lets a client tell the authorization server where it intends to use an access token. RFC 9700 describes audience-restricted access tokens, where the resource server verifies that a token was intended for it and refuses mismatched requests.
For AI governance, capability records belong in the audit trail. A useful incident record identifies the human principal, agent process, tool server, backend service, token audience, scope, approval event, untrusted inputs consulted, and state change.
Defense Pattern
- Issue task-specific handles. Prefer narrow document, ticket, or connector grants over broad account sessions.
- Avoid ambient authority. Do not let a tool inherit every permission attached to a browser, shell, API key, or service account.
- Attenuate before delegation. Wrap powerful capabilities with smaller interfaces before passing them onward.
- Validate audience and resource. A server should reject tokens and handles not issued for that server, resource, user, and purpose.
- Expire and revoke grants. Treat time limits, one-shot handles, rotation, and connector revocation as normal parts of the design.
- Keep untrusted context separate. Web pages, emails, PDFs, comments, and retrieval results may inform a decision, but should not grant authority.
- Log grant and use together. Connect the grant, approval, tool call, credential, downstream API, and final state change.
Source Discipline
Capability claims should name the authority mechanism. A file descriptor, object reference, bearer token, OAuth access token, API key, browser cookie, MCP connector grant, and service-account credential are not interchangeable. When reporting an agent security incident, name the object, grant, receiver, instruction source, and service that accepted the authority.
Spiralist Reading
Spiralism reads capability-based security as a discipline of explicit power. Authority should travel as a visible handle, not as a mood in the room. The cleaner question is not "Do we trust the agent?" but "What has this system handed to it, for what purpose, and who can see the handoff?"
The design virtue is friction at the right boundary. A capable institution can let software act without letting every instruction borrow every key. It can grant, attenuate, log, expire, and refuse.
Open Questions
- Which agent tools should receive true task-specific capabilities rather than broad user-session authority?
- How should approval prompts display capability scope without becoming unreadable?
- Can browser-based agents use narrow, resource-specific web capabilities without breaking ordinary sites?
- How should organizations revoke capabilities retained in agent memory, workflow state, or third-party connectors?
- Which failed capability checks should trigger an AI security incident process?
Related Pages
- Confused Deputy Problem
- AI Agent Identity
- Model Context Protocol
- Tool Use and Function Calling
- AI Agent Sandboxing
- AI Agent Observability
- AI Audit Trails
- Prompt Injection
- Context Poisoning
- Secure AI System Development
- Data Minimization
- AI Governance
- Agent Tool Permission Protocol
Sources
- Mark S. Miller, Ka-Ping Yee, and Jonathan Shapiro, Capability Myths Demolished, 2003.
- Norm Hardy, The Confused Deputy: (or why capabilities might have been invented), MIT CSAIL mirror of the 1988 paper text.
- IETF, RFC 8707: Resource Indicators for OAuth 2.0, February 2020.
- IETF, RFC 9700: Best Current Practice for OAuth 2.0 Security, January 2025.
- NIST CSRC, Least Privilege glossary entry, reviewed June 25, 2026.
- Model Context Protocol, Authorization specification, version 2025-11-25.
- Model Context Protocol, Security Best Practices, reviewed June 25, 2026.