The Agent Store Becomes the App Store
When AI assistants learn to install tools, call APIs, and suggest apps inside conversation, app review becomes a new layer of model-mediated governance.
The New Shelf
The familiar app store put software behind a shelf: search results, categories, rankings, reviews, developer accounts, policy rules, payment rules, removal powers, and a review process that decides what may be distributed through a dominant interface.
The agent store is forming inside AI assistants. OpenAI's Apps SDK lets developers build apps that run inside ChatGPT, backed by MCP servers and optional user interfaces. Anthropic's Connectors Directory lists reviewed MCP integrations that work across Claude products. Both systems turn external services into assistant-callable capabilities rather than icons a person opens directly.
That shift matters. A normal app waits for a user to launch it. An agent app can be invoked by a model in the middle of a task. The store is no longer only a marketplace. It is a routing layer between user intent, model interpretation, external APIs, and actions in the world.
The old app-store question was: should this software be allowed on the platform? The new question is sharper: when should a model be allowed to choose, describe, call, and trust this software on behalf of a person?
Why This Is Not Just Apps Again
There is continuity. Apple says the App Store is curated, reviewed for safety, scanned for malware, and governed by rules around performance, business practices, design, and law. The European Union's Digital Markets Act treats app stores as core platform services because they can become gatekeepers between business users and end users. App distribution has always been institutional power.
Agent directories inherit that power and add three new properties.
First, the model mediates discovery. A user may not browse a category page. The assistant may suggest an app, select a connector, or decide that a tool fits the request. Anthropic's connector documentation says directory connectors can be eligible for in-chat recommendations and that ranking is usage-based. OpenAI's submission documentation says apps with strong utility and user satisfaction may receive enhanced distribution such as directory placement or proactive suggestions. That is not neutral plumbing. It is recommendation infrastructure.
Second, the tool description becomes executable persuasion. In an app store, metadata persuades a person. In an agent directory, tool names, descriptions, annotations, and policy fields also instruct the model about when and how to call the tool. A misleading description is not only bad marketing. It can change machine behavior.
Third, installation grants operational reach. A connector may read a workspace, fetch private records, create tickets, send messages, update documents, or trigger workflows. The risk is not only that an app displays bad content. The risk is that a model-mediated tool changes state under an authority path the user only partly understands.
Review Becomes Runtime Governance
The platform documents already show the outline of a new regulatory grammar.
OpenAI requires app submissions to pass a review flow before public distribution. The submission process asks for MCP server details, app descriptions, privacy policy URLs, screenshots, test prompts and responses, tool information, and other materials. The review may include automated scans and manual review. Published apps can later be removed or restricted for policy violations, instability, inactivity, or legal and security concerns.
OpenAI's app guidelines also focus on tool-level behavior: accurate tool names, descriptions that match behavior, correct labels for read-only, write, destructive, and open-world actions, minimal inputs, privacy policy disclosure, response minimization, and explicit handling of side effects. These are not decorative requirements. They are the control language by which a conversational system decides whether an action is safe enough to expose to a user.
Anthropic's Software Directory Policy moves in the same direction. It requires reviewed software to meet safety, security, privacy, compatibility, and developer requirements. It says tool descriptions must be narrow and unambiguous, must match actual functionality, must not conflict with other listed software, and must not contain hidden or obfuscated instructions. It also requires privacy policies, support channels, documentation, test accounts, working examples, and relevant MCP annotations.
This is app review becoming runtime governance. The review team is not merely asking whether an app works. It is asking whether a model can understand the tool, whether the tool hides side effects, whether it collects excess context, whether it can be safely retried, and whether its metadata manipulates tool selection.
Discovery Is Power
App stores taught the industry that distribution rules are governance rules. Ranking, featuring, search, category placement, steering restrictions, payments, identity verification, and removal shape what developers build and what users see.
Agent directories make that power more intimate because the storefront can disappear into the answer. A person asks for help planning a trip, ordering groceries, making a slide deck, filing a ticket, checking a calendar, searching a drive, or sending a message. The assistant may decide that a tool belongs in the path. If that tool appears because of usage ranking, platform partnership, safety review, hidden eligibility rules, or proactive suggestion logic, then the assistant is not only answering. It is allocating access to an ecosystem.
OpenAI's terms say it does not guarantee placement, visibility, ranking, promotion, or conversational suggestions for an app. That is a reasonable platform reservation. It is also a map of the power at stake. The platform can decide whether an agent tool remains searchable, appears only by direct link, receives enhanced distribution, or is removed entirely.
The governance question is not whether directories should curate. They must. The question is whether curation is legible enough for developers, users, auditors, and institutions to understand why one capability is visible, another buried, and a third unavailable.
The Permission Problem
The biggest practical risk is permission translation.
Humans understand app permissions imperfectly even when the interface is explicit. Agent tools make that harder because permission is mixed with natural language intent. A user asks for a summary. The model decides what data to retrieve. A connector returns records. Another tool may be available to send, update, delete, buy, publish, or file. The difference between reading and acting has to be made visible inside a flow optimized for convenience.
This is where AI-specific security work matters. OWASP's LLM application project highlights risks such as prompt injection, supply-chain vulnerabilities, sensitive-information disclosure, improper output handling, excessive agency, and unbounded consumption. Agent stores sit near all of those risks because they distribute third-party capabilities into a model-controlled environment.
A malicious or sloppy connector does not need to defeat the whole platform to cause harm. It can over-collect context, return hidden instructions, mislabel a write action as read-only, leak identifiers in tool responses, confuse the model with broad descriptions, or create side effects that are hard to inspect after the fact. A well-run directory can reduce those risks, but it also centralizes trust in the directory operator's review process.
The user-facing question should be simple: what can this tool see, what can it change, who operates it, what data leaves the assistant boundary, how can I revoke it, and what log proves what happened?
A Better Agent Directory
A mature agent directory should be judged less like a promotional marketplace and more like a public-facing control surface.
It should separate listing from recommendation. Being approved for safe use is not the same as being suggested inside conversation. Users and developers should be able to distinguish baseline availability from boosted discovery, usage-based ranking, partner placement, and proactive invocation.
It should expose permission classes before use. Read-only, write, destructive, open-world, payment-related, sensitive-data, and workspace-wide capabilities should be visible in plain language, not only in developer metadata.
It should make tool metadata auditable. Names, descriptions, annotations, scopes, privacy-policy links, support contacts, and version histories should be part of the public record for published tools. If a tool changes from read-only to write-capable, that should not be a quiet cosmetic update.
It should require receipts for action. When an assistant uses a tool, the user and the relevant institution should be able to reconstruct what was called, what authority was used, what data was sent, what result came back, and what external state changed. This connects directly to the site's earlier analysis of agent logs as receipts.
It should preserve user choice without pretending all choice is equal. Open directories, custom connectors, sideloaded tools, and private workspace apps matter for innovation and institutional autonomy. But a hospital, school, court, bank, newsroom, or public agency cannot treat random agent extensions as casual browser add-ons. The governance standard should rise with consequence.
It should treat removal as institutional memory. If an app is delisted for policy, privacy, fraud, security, or reliability reasons, the platform should maintain enough disclosure for affected users and administrators to respond. Silent disappearance may protect a platform, but it weakens the record that institutions need for accountability.
The Spiralist Reading
The agent store is the app store after the interface learned to speak.
That sounds like a product change. It is an institutional change. The store no longer only distributes software. It helps decide which external capabilities become part of a model's practical world. The directory is a permission map, a trust registry, a ranking system, a policy engine, and a discovery surface folded into conversation.
Recursive reality appears when the model's answer changes the user's path, the path changes which tools gain usage, usage changes ranking, ranking changes future suggestions, and future suggestions define what users experience as the natural way to act. The store becomes a feedback loop. The most available capability becomes the most normal capability.
The right response is not panic about tool use. Assistants need tools if they are going to do useful work. The danger is unexamined tool distribution: agent ecosystems that inherit the commercial logic of app stores while gaining the delegated authority of assistants.
An agent directory is not just a list. It is a small constitution for model-mediated action. It defines who can enter, what they must disclose, what the model is allowed to believe about them, what users are shown, what administrators can govern, and what happens when trust fails.
The question for the next interface layer is therefore not "How many apps are in the store?" It is "What kind of institution is this store becoming?"
Sources
- OpenAI, Submit and maintain your app, reviewed May 2026.
- OpenAI, App submission guidelines, reviewed May 2026.
- OpenAI, App Developer Terms, updated December 17, 2025.
- Anthropic, Connectors directory documentation, reviewed May 2026.
- Anthropic, Software Directory Policy, April 15, 2026.
- Apple, App Review Guidelines, reviewed May 2026.
- European Commission, App distribution under the Digital Markets Act, reviewed May 2026.
- OWASP Foundation, Top 10 for Large Language Model Applications, reviewed May 2026.
- Church of Spiralism, The Agent-to-Agent Protocol Becomes the Handshake, The Tool Server Becomes the Trust Boundary, The AI Browser Becomes the Control Surface, and The Agent Log Becomes the Receipt.