Blog · Analysis · Last reviewed June 25, 2026

The Agent Store Becomes the App Store

When AI assistants can discover, suggest, install, and call third-party tools inside conversation, app review becomes governance for delegated action, not only software distribution.

An agent store is not just a catalog. It is a registry of capabilities the model may describe, recommend, invoke, and route through user or workspace authority.

The New Shelf

The familiar app store put software behind a shelf: search results, categories, rankings, reviews, developer accounts, policy rules, payment rules, removal powers, and a review process that decides what may be distributed through a dominant interface.

For this essay, an agent store is a curated or searchable distribution and governance layer for model-callable capabilities: apps, connectors, plugins, remote MCP servers, desktop extensions, and private workspace tools. It is a capability registry, not merely a storefront. It has five planes. The listing plane makes a capability discoverable. The review plane records metadata, permissions, policies, test evidence, and developer identity. The permission plane decides who may install or invoke it and under what action classes. The runtime plane is where a model actually calls a live endpoint under delegated authority. The accountability plane preserves version history, logs, incident status, removal reasons, and administrator registers.

The agent store is forming inside AI assistants. OpenAI's Apps SDK lets developers build apps that run inside ChatGPT, backed by MCP servers and optional user interfaces, and OpenAI documentation says publishing an approved app creates a Codex plugin for distribution. Anthropic's directory model lists MCP integrations that work across Claude products, while distinguishing verified connectors from community connectors and custom connectors. Both systems turn external services into assistant-callable capabilities rather than icons a person opens directly.

That shift matters. A normal app waits for a user to launch it. An agent app can be invoked by a model in the middle of a task, or suggested by the platform because it appears relevant to the user's request. The store is no longer only a marketplace. It is a routing layer between user intent, model interpretation, external APIs, approval prompts, administrator policy, and actions in the world.

The old app-store question was: should this software be allowed on the platform? The new question is sharper: when should a model be allowed to choose, describe, call, and trust this software on behalf of a person?

Current Context

As of June 25, 2026, the agent-store layer is no longer hypothetical. OpenAI's current docs use "apps" for both interactive ChatGPT apps and data connectors after a December 17, 2025 terminology change. Its submission documentation says public app distribution still runs through a dashboard review flow; approval is followed by publishing; published apps can initially be found by direct link or name search; and directory placement or proactive suggestions are separate enhanced-distribution opportunities, with no self-service request path. The same documentation distinguishes public submission from private or workspace use through developer mode, and says a published app version stores a reviewed snapshot of MCP metadata, including tool names, schemas, annotations, security schemes, UI metadata, and MCP server instructions, but live tool calls still execute against the developer's live endpoint.

OpenAI's user and admin documentation adds the permission layer. App permissions decide when ChatGPT asks before using access that already exists; they do not grant an app new access by themselves. Enabled apps may receive relevant conversation context, and app interactions can involve Memory or web-search context depending on user and workspace settings. In Business, Enterprise, and Edu workspaces, owners or admins can manage availability from workspace settings; Enterprise and Edu apps are disabled by default; Business apps are enabled by default; RBAC, action controls, parameter constraints for some non-sync apps, domain limits, and compliance logs are part of the current control surface. Custom MCP servers remain third-party systems that OpenAI says are not verified by OpenAI.

Anthropic's directory model is parallel but not identical. Its Connectors Directory is a catalog of MCP servers that work across Claude products and includes both Verified connectors, which Anthropic says it has reviewed for security, reliability, and compatibility, and Community connectors, which pass automated checks but are not reviewed in depth. Directory connectors are automatically eligible for in-chat Suggested Connectors, and Anthropic says ranking is usage-based. Its submission docs treat remote MCP servers, desktop extensions, and MCP Apps as related but distinct submission types, with review fields for URLs, transport, tool annotations, authentication, data handling, test accounts, use cases, policy acknowledgments, and allowed link targets. The submission flow also treats directory inclusion as discoverability, not a guarantee that the connector's underlying behavior changes. Its Software Directory Policy forbids hidden instructions, mismatched tool descriptions, excessive data collection, and unsupported categories such as financial transactions and advertising-focused software.

Anthropic also exposes a different versioning risk. Its post-publication documentation says an MCP server is a live API: after publication, developers can add, change, or remove tools on the server without resubmission or scheduled re-review, and Claude picks up the new tool surface on the next connection. That may be useful for ordinary API maintenance, but it means the directory label, listing text, live tool surface, and user's connected endpoint are separate governance objects.

OpenAI's App Developer Terms add the contractual layer. They say OpenAI does not guarantee placement, visibility, ranking, promotion, or conversational suggestions, and may reject or remove an app. They also prohibit apps from initiating or facilitating money transfers, cryptocurrency transfers, or financial or investment transactions through the services. These are not only product policies. They are early rules for which kinds of delegated action can enter a dominant assistant interface.

The enterprise layer is now part of the same story. That makes the "store" partly public directory, partly private allowlist, partly plugin marketplace, partly data-governance surface, and partly institutional risk register.

The current platform shape is therefore plural: public directories, private workspace connectors, custom MCP servers, Codex plugins, desktop extensions, admin-controlled workspace apps, and ordinary mobile app stores all coexist. "Agent store" is the right term only if it stays precise: review, ranking, metadata, permissions, and logs shape what an assistant can do in practice, but those controls remain platform claims unless they are independently auditable.

The governance implication is concrete: public directory approval is not the same as an organization's internal risk acceptance. A platform can review a listing, test declared tools, or restrict a category of transactions, but a hospital, school, court, bank, newsroom, or public agency still has to decide whether that capability fits its own data classes, legal duties, retention rules, incident process, and human-approval gates.

Why This Is Not Just Apps Again

There is continuity. Apple says the App Store is curated, reviewed for safety, scanned for malware, and governed by rules around performance, business practices, design, and law. The European Union's Digital Markets Act treats app stores as core platform services because they can become gatekeepers between business users and end users. App distribution has always been institutional power.

Agent directories inherit that power and add four new properties.

First, the model mediates discovery. A user may not browse a category page. The assistant may suggest an app, select a connector, or decide that a tool fits the request. Anthropic's connector documentation says directory connectors can be eligible for in-chat recommendations and that ranking is usage-based. OpenAI's submission documentation says apps with strong utility and user satisfaction may receive enhanced distribution such as directory placement or proactive suggestions. That is not neutral plumbing. It is recommendation infrastructure.

Second, the tool description becomes executable persuasion. In an app store, metadata persuades a person. In an agent directory, tool names, descriptions, annotations, and policy fields also instruct the model about when and how to call the tool. A misleading description is not only bad marketing. It can change machine behavior, especially when the model is choosing among multiple tools that appear to satisfy the same intent.

Third, installation grants operational reach. A connector may read a workspace, fetch private records, create tickets, send messages, update documents, or trigger workflows. The risk is not only that an app displays bad content. The risk is that a model-mediated tool changes state under an authority path the user only partly understands. That puts agent stores in the same governance family as the tool server trust boundary, agent identity, and the Agent Tool Permission Protocol.

Fourth, policy memory changes future action. Once a user, team, or admin has allowed an app, later workflows may treat that permission as background infrastructure. The relevant question is not only whether the first approval was informed. It is whether approval memory stays scoped to the same user, tool, endpoint, action class, data category, and version.

Review Becomes Runtime Governance

The platform documents already show the outline of a new regulatory grammar.

OpenAI requires app submissions to pass a review flow before public distribution. The submission process asks for MCP server details, app descriptions, privacy policy URLs, screenshots, test prompts and responses, tool information, and other materials. The review may include automated scans and manual review. Published apps can later be removed or restricted for policy violations, instability, inactivity, or legal and security concerns.

OpenAI's app guidelines describe MCP tools as the manual ChatGPT uses to operate an app. The guidelines focus on tool-level behavior: accurate tool names, descriptions that match behavior, correct labels for read-only, write, destructive, and open-world actions, minimal inputs, privacy policy disclosure, response minimization, and explicit handling of side effects. These are not decorative requirements. They are the control language by which a conversational system decides whether an action is safe enough to expose to a user.

The versioning detail is especially important. OpenAI says published app metadata should be treated as a versioned API contract: tool lists, names, descriptions, schemas, annotations, security schemes, UI metadata, and MCP server instructions are stored as a reviewed snapshot. But the MCP server remains live: tool calls execute against the developer's endpoint and return live business data. Review therefore governs the contract and the declared surface; runtime governance still has to watch what the live system actually does, and administrators need a way to see when refreshed actions, endpoint behavior, or available scopes diverge from what was approved.

Anthropic's connector documentation shows the other side of the same problem: when the server surface can change without resubmission or scheduled re-review, the directory label cannot be treated as a frozen safety certificate. A mature directory therefore needs to say what was reviewed, what is live, what changed since installation, and whether the current tool surface still matches the institution's approval.

That creates a four-part split a mature store has to name: the public directory listing, the reviewed metadata contract, the live endpoint behavior, and the workspace enablement decision. Treating those as one object invites mistakes. The listing can be safe enough to publish while the local deployment is still wrong for a particular account, user role, dataset, jurisdiction, or workflow.

The MCP specification itself reinforces the limit. Tool annotations such as readOnlyHint, destructiveHint, idempotentHint, and openWorldHint are useful risk vocabulary, but the spec calls them hints and warns clients not to make tool-use decisions from untrusted annotations alone. A directory can require annotations, but safety still needs deterministic controls: scope enforcement, confirmations, server identity, runtime checks, sandboxing, and receipts.

Anthropic's Software Directory Policy moves in the same direction. It requires reviewed software to meet safety, security, privacy, compatibility, and developer requirements. It says tool descriptions must be narrow and unambiguous, must match actual functionality, must not conflict with other listed software, must not pull behavioral instructions from external sources, and must not contain hidden or obfuscated instructions. It also requires privacy policies, support channels, documentation, test accounts, working examples, and relevant MCP annotations.

This is app review becoming runtime governance. The review team is not merely asking whether an app works. It is asking whether a model can understand the tool, whether the tool hides side effects, whether it collects excess context, whether it can be safely retried, and whether its metadata manipulates tool selection.

Discovery Is Power

App stores taught the industry that distribution rules are governance rules. Ranking, featuring, search, category placement, steering restrictions, payments, identity verification, and removal shape what developers build and what users see.

Agent directories make that power more intimate because the storefront can disappear into the answer. A person asks for help planning a trip, ordering groceries, making a slide deck, filing a ticket, checking a calendar, searching a drive, or sending a message. The assistant may decide that a tool belongs in the path. If that tool appears because of usage ranking, platform partnership, safety review, hidden eligibility rules, or proactive suggestion logic, then the assistant is not only answering. It is allocating access to an ecosystem.

OpenAI's terms say it does not guarantee placement, visibility, ranking, promotion, or conversational suggestions for an app. That is a reasonable platform reservation. It is also a map of the power at stake. The platform can decide whether an agent tool remains searchable, appears only by direct link, receives enhanced distribution, or is removed entirely.

The governance question is not whether directories should curate. They must. The question is whether curation is legible enough for developers, users, auditors, and institutions to understand why one capability is visible, another buried, and a third unavailable. A directory should be able to distinguish at least four states: reviewed for baseline eligibility, searchable by name, recommended in conversation, and invoked automatically or semi-automatically inside a workflow.

It should also disclose the kind of recommendation at work. A user and an auditor need to know whether a tool appeared because it was installed by the user, allowed by an admin, ranked by aggregate usage, selected for enhanced distribution, bundled by a platform relationship, or suggested by the model from the visible prompt. Those are different governance facts even when the interface renders them as one helpful suggestion.

The Permission Problem

The biggest practical risk is permission translation.

Humans understand app permissions imperfectly even when the interface is explicit. Agent tools make that harder because permission is mixed with natural language intent. A user asks for a summary. The model decides what data to retrieve. A connector returns records. Another tool may be available to send, update, delete, buy, publish, or file. The difference between reading and acting has to be made visible inside a flow optimized for convenience.

This is where AI-specific security work matters. OWASP's LLM application project highlights risks such as prompt injection, supply-chain vulnerabilities, sensitive-information disclosure, improper output handling, excessive agency, and unbounded consumption. Agent stores sit near all of those risks because they distribute third-party capabilities into a model-controlled environment.

A malicious or sloppy connector does not need to defeat the whole platform to cause harm. It can over-collect context, return hidden instructions, mislabel a write action as read-only, leak identifiers in tool responses, confuse the model with broad descriptions, or create side effects that are hard to inspect after the fact. A well-run directory can reduce those risks, but it also centralizes trust in the directory operator's review process.

MCP-specific security guidance sharpens the point. The official MCP security best practices warn against token passthrough, weak OAuth state handling, redirect mistakes, SSRF, and audit-trail confusion. OWASP's MCP Top 10 names token exposure, scope creep, tool poisoning, supply-chain attacks, command injection, insufficient authorization, missing telemetry, shadow servers, and context over-sharing. An agent store is therefore not only a storefront. It is a supply-chain and authorization boundary for model-mediated action.

The permission vocabulary has to be shared across people, models, and administrators. A useful directory should not bury action classes inside schema fields. It should show whether a tool can read, write, delete, send, publish, purchase, transfer data outside the workspace, touch regulated data, or act through a named agent identity. The same tool can be harmless in one account and dangerous in another if the connected credential has broader scope.

The safest permission model separates discovery, installation, invocation, and durable approval memory. A tool found in search should not be treated as enabled. An enabled read tool should not silently become a write tool. A remembered approval should expire or be rechecked when the task, conversation, version, endpoint, action class, data category, or operator changes.

The user-facing question should be simple: what can this tool see, what can it change, who operates it, what data leaves the assistant boundary, how can I revoke it, and what log proves what happened?

The institutional version needs a stronger record. OpenAI's enterprise app guidance already points toward workspace enablement, role-based access, app action controls, approval settings, and compliance logs. Those controls are necessary but not sufficient. A hospital, newsroom, school, agency, or bank still needs its own AI system inventory, audit trail, and agent observability record for the tools it allows into consequential workflows.

Failure Modes

Metadata poisoning. Tool names, descriptions, examples, schemas, annotations, and server instructions help the model decide what to call. A misleading description is not merely bad copy. It can become routing advice for an automated actor.

Review-runtime drift. A directory may review a metadata snapshot while live calls continue against the developer's endpoint. That is necessary for real services, but it creates a gap between the reviewed contract and the behavior users experience after deployment.

Ownership and endpoint drift. A listing may keep the same name while the operator, endpoint, underlying API, support contact, subprocessors, or data-processing commitments change. If the store does not surface those changes, users and administrators may keep trusting a tool whose institutional boundary has moved.

Permission laundering. A user asks for an ordinary outcome, the assistant translates that request into broad data access, and a connector converts it into API calls. If the interface does not show the data category, action class, and external recipient, consent becomes a mood rather than a record.

Lookalike listings. A tool can borrow a familiar brand name, icon style, endpoint pattern, or one-word generic label. Directory review has to verify operator identity, endpoint control, support contact, privacy policy ownership, and name clarity, because model-mediated discovery gives misleading metadata operational force.

Tool-chaining opacity. Two separately reasonable tools can become unsafe together: one reads private records, another sends messages, uploads files, posts externally, or opens links. The directory may approve each tool in isolation while the assistant composes them into a data-exfiltration path.

Recommendation capture. Usage-based ranking, enhanced distribution, partner placement, and proactive suggestions can make one tool feel like the natural path. The assistant's convenience layer can quietly become market structure.

Review laundering. A team treats platform approval, verified-connector status, or a directory listing as if it were internal authorization for its own data, roles, legal duties, and incident process. Public eligibility becomes a substitute for local risk acceptance.

Label confusion. Verified, Community, custom, published, approved, installed, and enabled can sound like the same trust state to a user. They are not. Some labels mean manual review, some mean automated checks, some mean local configuration, and some mean only that a capability is reachable.

Approval-memory creep. A one-time permission remembered for an app, conversation, user, or workspace survives a material change in tool version, endpoint, action class, connected credential, or data category. The user thinks they approved one path; the system remembers a broader one.

Shadow installation. Private workspace tools, custom MCP servers, desktop extensions, and local plugins can bypass public directory review. That flexibility is useful, but in organizations it needs the same discipline as vendor and platform governance, agent sandboxing, and agentic supply-chain management.

Receipt gaps. If the system cannot reconstruct which tool was suggested, which tool was approved, which endpoint was called, what arguments were sent, and what changed outside the chat, incident review becomes guesswork.

Policy monoculture. Dominant assistant platforms can define which categories of tools are acceptable, which business models are impossible, and which forms of action require friction. Some gatekeeping is needed for safety. The danger is opaque gatekeeping that becomes infrastructure before it becomes accountable.

A Better Agent Directory

A mature agent directory should be judged less like a promotional marketplace and more like a public-facing control surface.

It should separate listing from recommendation. Being approved for safe use is not the same as being suggested inside conversation. Users and developers should be able to distinguish baseline availability from boosted discovery, usage-based ranking, partner placement, and proactive invocation.

It should keep trust labels precise. Verified, community, custom, public, private, installed, enabled, and recommended should be visible as different states. The label should travel into the chat surface, the admin register, and the action receipt so a later reviewer can tell whether the platform manually reviewed the capability, ran only automated checks, or did not review it at all.

It should separate discovery from enablement. Searchability, installation, invocation, and remembered approval should be distinct states with distinct records. This matters most when the model is allowed to move from "I found a tool" to "I called a tool" without a visible app-launch moment.

It should expose permission classes before use. Read-only, write, destructive, open-world, payment-related, sensitive-data, and workspace-wide capabilities should be visible in plain language, not only in developer metadata.

It should disclose data-flow boundaries. A user should know whether the app receives prompt snippets, retrieved records, memory-derived context, approximate location, account identifiers, files, search context, tool arguments, or external-link requests, and whether those data flows are governed by the platform, the developer, the workspace, or all three.

It should make tool metadata auditable. Names, descriptions, annotations, scopes, verified developer identity, endpoint ownership, privacy-policy links, support contacts, and version histories should be part of the public record for published tools. If a tool changes from read-only to write-capable, that should not be a quiet cosmetic update. If a live endpoint's behavior diverges from the reviewed metadata contract, the directory should have a fast path for restriction, rollback, and incident notice.

It should verify operator and endpoint ownership. Organization verification, support contacts, privacy policy URLs, OAuth configuration, allowed link targets, domain control, and ownership changes are not clerical details. They define who is receiving user data and who can be held responsible when the connector fails.

It should pin the contract that the model sees. Reviewed tool lists, schemas, annotations, server instructions, UI resource metadata, and security schemes should be signed, versioned, and compared to the runtime surface. The directory should know when the model is being asked to rely on text that was not reviewed.

It should require receipts for action. When an assistant uses a tool, the user and the relevant institution should be able to reconstruct what was called, what authority was used, what data was sent, what result came back, and what external state changed. The same issue appears in agent logs as receipts.

It should preserve user choice without pretending all choice is equal. Open directories, custom connectors, sideloaded tools, and private workspace apps matter for innovation and institutional autonomy. But a hospital, school, court, bank, newsroom, or public agency cannot treat random agent extensions as casual browser add-ons. The governance standard should rise with consequence, especially where enterprise connectors become permission maps or where a payment agent becomes the cashier.

It should require internal acceptance for high-consequence use. Public directory review should be baseline eligibility, not final approval. Internal enablement should name the owner, purpose, data classes, scopes, model-visible metadata, retention rule, audit path, support contact, incident contact, and revocation plan.

It should treat removal as institutional memory. If an app is delisted for policy, privacy, fraud, security, or reliability reasons, the platform should maintain enough disclosure for affected users and administrators to respond. Silent disappearance may protect a platform, but it weakens the record that institutions need for audit and incident review.

It should give administrators a real register. Organizations need a list of approved, installed, disabled, suggested, and blocked capabilities, with owners, scopes, versions, action classes, data categories, connected systems, logs, sandbox status, and revocation paths. Without that register, the agent store becomes shadow IT with a conversational interface.

It should connect to procurement and assurance. High-consequence organizations need vendor review, data-processing terms, accessibility review, security evidence, incident contacts, and periodic recertification before a model-callable app touches systems of record. That puts mature directories beside AI procurement, AI audits and assurance, privacy and data, and public registers.

What This Changes

The agent store is the app store after the interface learned to speak.

That sounds like a product change. It is an institutional change. The store no longer only distributes software. It helps decide which external capabilities become part of a model's practical world. The directory is a permission map, a trust registry, a ranking system, a policy engine, and a discovery surface folded into conversation.

Recursive reality appears when the model's answer changes the user's path, the path changes which tools gain usage, usage changes ranking, ranking changes future suggestions, and future suggestions define what users experience as the natural way to act. The store becomes a feedback loop. The most available capability becomes the most normal capability.

The right response is not panic about tool use. Assistants need tools if they are going to do useful work. The danger is unexamined tool distribution: agent ecosystems that inherit the commercial logic of app stores while gaining the delegated authority of assistants.

An agent directory is not just a list. It is a small constitution for model-mediated action. It defines who can enter, what they must disclose, what the model is allowed to believe about them, what users are shown, what administrators can govern, and what happens when trust fails. That is one piece of the wider agent-native internet: identity, tool access, payments, connectors, logs, rankings, public registers, and human approval gates all learning to speak to one another.

The question for the next interface layer is therefore not "How many apps are in the store?" It is "What kind of institution is this store becoming?"

Source Discipline

Current-source claims were checked on June 25, 2026. This article treats OpenAI and Anthropic documentation as evidence of declared platform controls, not proof that every app, connector, plugin, or MCP server is safe. Product docs can change, help-center pages may describe staged rollouts, and review programs are not independent audits unless the platform publishes audit methods, results, enforcement data, and material incident histories.

Terms, submission guidelines, and directory policies are governance evidence of a different kind: they show what a platform reserves the right to approve, rank, restrict, remove, or forbid. They do not show whether a particular tool behaved safely in production, whether every live endpoint matched its reviewed metadata, or whether users and administrators understood the authority path.

The most important source split is between reviewed metadata and live behavior. A listing can show a reviewed snapshot, a policy can require accurate annotations, and an admin console can display an allowed action set. None of that proves the runtime endpoint is still behaving as reviewed. This is especially important where one platform uses reviewed snapshots for published metadata while another lets the live MCP server surface change after publication. For consequential use, source discipline has to ask for versioned metadata, runtime telemetry, revocation records, incident notices, and evidence that approval memory did not silently expand.

OWASP and MCP sources are used as security taxonomies and protocol guidance, not as measurements of incident frequency. Apple and European Commission sources are used for the platform-governance analogy: app stores are already treated as reviewed, ranked, removable distribution infrastructure. The agent-store claim is narrower: when that distribution layer becomes model-mediated, metadata, recommendation, permissions, and logs become part of the safety case.

Terms like "agent store" are descriptive, not a single official product category. OpenAI currently speaks in terms of ChatGPT apps, Apps SDK submission, app action controls, and Codex plugin distribution. Anthropic speaks in terms of connectors, MCP Apps, plugins, skills, and directories. The governance problem cuts across the naming.

Sources

OpenAI, MCP server concepts for Apps SDK, reviewed June 25, 2026.
OpenAI, Submit and maintain your app, reviewed June 25, 2026.
OpenAI, App submission guidelines, reviewed June 25, 2026.
OpenAI, Connect from ChatGPT: app permission settings, reviewed June 25, 2026.
OpenAI, Building MCP servers for ChatGPT Apps and API integrations, reviewed June 25, 2026.
OpenAI, App Developer Terms, updated December 17, 2025; reviewed June 25, 2026.
OpenAI Help Center, Apps in ChatGPT, reviewed June 25, 2026.
OpenAI Help Center, Admin Controls, Security, and Compliance in apps, reviewed June 25, 2026.
OpenAI, Developers can now submit apps to ChatGPT, December 17, 2025; reviewed June 25, 2026.
Anthropic, Connectors directory documentation, reviewed June 25, 2026.
Anthropic, Connector verification, reviewed June 25, 2026.
Anthropic, Submitting to the Connectors Directory, reviewed June 25, 2026.
Anthropic, Manage your listing after publishing, reviewed June 25, 2026.
Anthropic, Software Directory Policy, April 15, 2026; reviewed June 25, 2026.
Apple, App Review Guidelines and About App Store security, reviewed June 25, 2026.
European Commission, Digital Markets Act and App distribution under the Digital Markets Act, reviewed June 25, 2026.
OWASP Foundation, Top 10 for Large Language Model Applications and OWASP MCP Top 10, reviewed June 25, 2026.
Model Context Protocol, Security Best Practices, schema reference for tool annotations, and Tool Annotations as Risk Vocabulary, reviewed June 25, 2026.
NIST, AI Agent Standards Initiative and announcement, reviewed June 25, 2026.

Return to Blog