Wiki · Concept · Last reviewed June 25, 2026

LiteLLM

LiteLLM is an open-source AI gateway and Python SDK that lets teams call many model providers through OpenAI-format interfaces while centralizing routing, fallbacks, keys, budgets, spend tracking, and operational policy.

Definition

LiteLLM is an open-source library and gateway for calling many large-language-model providers through a common OpenAI-format interface. The upstream README describes it as an AI gateway for more than 100 LLMs, and the documentation presents two main uses: a Python SDK for direct calls and a self-hosted proxy server that applications can treat as an OpenAI-compatible endpoint.

The narrow definition matters. LiteLLM is not a model, a benchmark, a safety certification, or a claim that different providers behave the same. It is a translation and control layer between applications and model backends. Its governance significance comes from that middle position: the gateway can hide provider diversity from application code while making provider choice, routing policy, cost, and access control more inspectable.

How It Works

In SDK use, application code calls LiteLLM functions and receives responses in a consistent shape while LiteLLM handles provider-specific parameters and response handling. In proxy use, teams run a LiteLLM server with a configuration file that maps model names or aliases to provider endpoints, credentials, and routing settings. Client applications then send OpenAI-style requests to the proxy rather than embedding provider-specific code for every backend.

The documented proxy surface includes model lists, OpenAI-compatible routes, virtual keys, user and team controls, budgets, rate limits, spend tracking, load balancing, fallbacks, provider budget routing, guardrails, and an administrative interface. The project also documents support for providers and endpoints beyond a single chat API. These features make LiteLLM a practical place to enforce operational rules, but they also make its configuration part of the AI system itself.

A model alias is therefore not just a convenience label. It may point to one provider today, a fallback tomorrow, or a cost-routed pool during an outage. A budget rule may constrain a virtual key, a team, a provider, or a model group. A logging setting may decide whether prompts, outputs, metadata, and spend records are available for later review.

Agent Context

Agents often benefit from a single model endpoint because planners, tool callers, evaluators, and background workers can share one client. LiteLLM can make that architecture cleaner by letting those components use one base URL while platform teams move provider details into gateway policy.

The same abstraction can also make agent evidence ambiguous. If a workflow log says only that an agent called "gpt-4o" or "company-chat," it may omit the actual provider, region, fallback path, virtual key, budget state, retries, and guardrail layer active at the time. For consequential agents, the gateway decision is part of the action record. The answer is not only what the model returned; it is also how the request was routed, authorized, metered, and logged.

Governance Use

A governance-grade LiteLLM record should preserve the LiteLLM version or commit, container image digest, configuration file, model aliases, provider names, provider regions where known, credential source, virtual key, user, team, budget policy, rate limit policy, retry settings, timeout settings, load-balancing rules, fallback rules, guardrail settings, cache settings, request and response logging policy, redaction policy, spend logs, admin role changes, and incident links.

That record should be joined to product logs and agent traces. If an audit trail keeps prompts but drops gateway metadata, reviewers may be unable to tell whether a failure came from the model, provider, alias, fallback, budget cutoff, guardrail, timeout, or operator change. LiteLLM is useful because it centralizes many of those facts; governance should not then erase them.

Limits

OpenAI-compatible does not mean behaviorally identical. Providers can differ in tokenizer behavior, supported parameters, moderation defaults, tool-call formatting, streaming behavior, error codes, latency, availability, data handling, and model-update schedules. A proxy can normalize an interface without making the underlying systems interchangeable.

LiteLLM also concentrates sensitive power. A central gateway may hold provider credentials, receive prompts and outputs, meter users, enforce budgets, select fallbacks, and expose administrative controls. That makes hardening, least privilege, secret management, log minimization, retention limits, and operator review essential.

Source Discipline

Use the LiteLLM documentation and upstream repository for claims about supported architecture, proxy routes, virtual keys, budgets, routing, load balancing, fallbacks, OpenAI-compatible providers, and configuration settings. Use the relevant model provider's own documentation for claims about a specific model, region, retention policy, training use, or safety behavior. Treat vendor logos, community examples, and compatibility snippets as leads to verify, not as evidence that a deployment has a particular governance posture.

Spiralist Reading

Spiralism reads LiteLLM as the mask of one API over many voices.

The user sees a single assistant. The organization may be running a switchboard: aliases, providers, fallbacks, budgets, retries, and logs. The ethical question is not whether the mask is useful. It is whether the institution can still name what spoke, who paid for it, which rule selected it, what was recorded, and who had authority to change the path.

Sources


Return to Wiki