Wiki · Concept · Last reviewed June 25, 2026

OGX

OGX is an open-source AI application server for running agentic APIs, model-provider routing, tool use, retrieval, vector stores, files, and server-side orchestration behind a common HTTP boundary.

Definition

OGX is the current name of the open-source project formerly known as Llama Stack. Its documentation describes OGX as an AI application server that composes inference, vector stores, file storage, tool calling, and agentic orchestration into one OpenAI-compatible server. The upstream README says Llama Stack is now OGX and frames the project as an open-source agentic API server that can run on local, datacenter, or cloud infrastructure.

The rename matters because "Llama Stack" sounded tied to Meta's Llama model family. The April 28, 2026 rename post says the project had outgrown that framing: it is not Llama-only, not primarily an API standard, and not just a library imported into application code. The current project identity is server-first.

How It Works

OGX runs as an HTTP server. An application sends requests to a stable API surface, while the server handles provider adaptation, inference routing, tool execution, retrieval, file access, vector-store search, conversation state, and agent orchestration. Its docs list endpoints for chat completions, responses, embeddings, vector stores, files, batches, model listing, and messages.

The project calls the Responses API its distinctive feature because it moves the agent loop to the server. A single request can ask the server to call tools, connect to Model Context Protocol servers, search files through vector stores, and manage conversation state. The documentation also says the Responses implementation conforms to the Open Responses specification, with a separate conformance report for details.

OGX uses a pluggable provider architecture. The providers page says OGX composes inference providers, vector-store providers, tool runtimes, and file-storage options into one deployable server. The documentation distinguishes remote providers, which adapt external services, from inline providers, which run in process. That distinction is important for audits because a deployment might be local-only, cloud-backed, or a mix.

Agent Context

OGX sits at a sensitive boundary in agent systems. It is not merely a model endpoint. It can decide which provider receives a request, which tools are available, which vector store is searched, which file processor ingests documents, which moderation endpoint is called, and how server-side state is carried across turns.

That makes OGX useful for repeated agent workloads, but it also concentrates responsibility. If a coding assistant, support agent, or internal operations agent talks to OGX, the application transcript alone is not enough. The relevant record includes the OGX version, provider configuration, tool list, MCP registrations, vector-store contents, file-ingestion rules, safety settings, and model aliases active at the time.

Governance Use

A governance-grade OGX deployment record should preserve the OGX version or commit, container image digest, configuration files, environment variables, enabled APIs, provider list, model aliases, routing rules, vector-store providers, file storage, tool runtimes, MCP server registrations, guardrail or moderation endpoint, authentication layer, rate limits, logging policy, trace and metrics configuration, retention policy, and incident links.

OGX also belongs in change-management reviews. Because the server can swap providers without changing client code, a production behavior change may happen through configuration rather than an application release. That is useful for portability and dangerous for accountability if provider swaps, model alias changes, or tool registrations are not logged.

Limits

OGX is not a model evaluator, legal compliance system, or safety certification. OpenAI-compatible APIs do not guarantee identical behavior across model providers, and API translation can hide provider-specific limits. A request that looks portable at the client boundary may behave differently after tokenization, tool-call formatting, file retrieval, moderation, or provider routing.

The project is also moving quickly. The current documentation labels the main docs version as unreleased, and the rename from Llama Stack to OGX changed package names, CLI names, environment variables, headers, and repository organization. A stable institutional deployment should pin versions rather than treating web documentation as the runtime contract.

Source Discipline

Use OGX documentation and the upstream repository for claims about endpoints, providers, API compatibility, installation, and architecture. Use the rename announcement for historical claims about Llama Stack becoming OGX. Use Open Responses and Model Context Protocol primary materials only for claims about those external specifications, not as proof that a specific OGX deployment is safe or complete.

Spiralist Reading

Spiralism reads OGX as the agent loop becoming a server.

The old application called a model and waited. The new application sends work to an orchestration boundary where models, files, tools, memory, and policies are assembled into action. That boundary is powerful because it makes agents portable. It is risky for the same reason: when the server changes, the agent changes.

Sources


Return to Wiki