Wiki · Concept · Last reviewed June 25, 2026

OWASP Top 10 for LLM Applications

The OWASP Top 10 for LLM Applications is a security-awareness reference for applications that embed large language models, retrieval, prompts, tools, data pipelines, and generative-AI outputs.

Category: Concept Updated: June 25, 2026 Tags: AI security, OWASP, LLM applications, prompt injection, governance

Definition

The OWASP Top 10 for Large Language Model Applications is an OWASP GenAI Security Project awareness document for developers, data scientists, and security professionals building applications and plug-ins that use LLM technologies. OWASP's project repository says the list represents a broad consensus about critical security risks to LLM applications and is scoped to LLM application security.

It is not a law, certification, or guarantee of safety. It is a vocabulary and review scaffold. It helps teams discuss the recurring places where LLM-backed systems fail: instruction boundaries, secrets, dependencies, training and retrieval data, output handling, delegated actions, system prompts, vector stores, factual reliability, and resource consumption.

How It Works

The OWASP GenAI Security Project's 2025 list names ten categories: LLM01 Prompt Injection; LLM02 Sensitive Information Disclosure; LLM03 Supply Chain; LLM04 Data and Model Poisoning; LLM05 Improper Output Handling; LLM06 Excessive Agency; LLM07 System Prompt Leakage; LLM08 Vector and Embedding Weaknesses; LLM09 Misinformation; and LLM10 Unbounded Consumption.

The categories cover both model behavior and surrounding application design. Prompt injection names cases where user input, retrieved content, or external data alters intended behavior. Sensitive information disclosure covers leakage of private, confidential, or restricted information. Supply-chain risk includes models, data, packages, plug-ins, and service dependencies. Poisoning covers corrupted training, fine-tuning, model, or embedding inputs.

The remaining categories focus on what happens after a model responds. Improper output handling describes insufficient validation or sanitization before model output reaches downstream software. Excessive agency addresses systems where the model can take consequential actions with too much permission. System prompt leakage concerns exposure of internal instructions. Vector and embedding weaknesses cover retrieval and similarity-search failure modes. Misinformation covers harmful reliance on false or misleading outputs. Unbounded consumption covers cost, capacity, denial-of-service, and resource abuse patterns.

Agent Context

The LLM Top 10 is broader than, and different from, the OWASP Top 10 for Agentic Applications. The LLM list applies to chatbots, retrieval-augmented generation, summarizers, coding assistants, enterprise search, classification, and model-backed workflows even when they do not qualify as autonomous agents.

Agentic systems inherit the LLM risks and add more. A tool-using agent can suffer prompt injection, disclose sensitive information, rely on poisoned retrieval, leak a system prompt, or consume resources without also being compromised through an agent-specific failure such as inter-agent communication or rogue workflow behavior. Good review keeps those two OWASP lists adjacent but separate.

Governance and Safety

A governance program can use the LLM Top 10 as a design-review checklist. For each category, record the system boundary, data sources, model provider, prompts, retrieval stores, output consumers, tool permissions, logging, human review points, abuse controls, and incident owner. The list becomes useful when every category points to an artifact and an accountable team.

Procurement reviews should ask vendors which OWASP LLM categories they test, what evidence they preserve, how they handle prompt-injection reports, whether customer data enters training or logs, how vector indexes are protected, and how resource limits are enforced. Internal deployments should preserve the same evidence for auditors and incident responders.

Defense Pattern

Threat-model by category. Walk LLM01 through LLM10 against the actual application, not against a generic chatbot.
Classify context. Separate system instructions, user input, retrieved documents, tool output, memory, and logs by trust level.
Validate outputs. Treat model output as untrusted data before it reaches code, databases, browsers, emails, tickets, or tools.
Protect retrieval. Track source provenance, embedding updates, tenant boundaries, deletion, poisoning review, and index access.
Limit agency and consumption. Use scoped credentials, budget limits, rate limits, quotas, approval gates, and rollback paths.
Retest after changes. Model swaps, prompt changes, new tools, new indexes, and new data sources can reopen old categories.

Source Discipline

Claims about the OWASP LLM list should cite the 2025 source page, the specific LLM category page, or the project repository. Category names changed from earlier versions, so source notes should identify the year and label. Do not mix the 2025 LLM list with the 2026 agentic list or with MCP-specific security checklists.

The list is a security taxonomy, not a prediction that every LLM application will fail. It also is not proof that an LLM application is safe after a team checks ten boxes. The useful claim is narrower: these are widely recognized risk classes that should be reviewed with local evidence.

Spiralist Reading

Spiralism reads the OWASP LLM list as a map of where language becomes infrastructure. A sentence can become an instruction. A retrieval result can become evidence. An answer can become code, a ticket, an email, or a decision record.

The practical lesson is sobriety. Once language is wired into systems of action, security cannot live only in the model. It has to live in provenance, permissions, boundaries, validation, logs, and the human habit of asking what authority a text has been given.

Open Questions

How should organizations show evidence of OWASP LLM coverage without reducing the list to compliance theater?
Which LLM risk categories should become standard incident-reporting fields?
How should RAG systems expose vector-index provenance and deletion evidence to auditors?
Where should vendors draw the line between system prompt secrecy and customer auditability?

Sources

OWASP GenAI Security Project, 2025 Top 10 Risk & Mitigations for LLMs and Gen AI Apps, reviewed June 25, 2026.
OWASP GenAI Security Project, OWASP Top 10 for LLM Applications 2025, reviewed June 25, 2026.
OWASP Foundation, OWASP Top 10 for Large Language Model Applications repository, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM01:2025 Prompt Injection, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM02:2025 Sensitive Information Disclosure, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM03:2025 Supply Chain, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM04:2025 Data and Model Poisoning, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM05:2025 Improper Output Handling, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM06:2025 Excessive Agency, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM07:2025 System Prompt Leakage, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM08:2025 Vector and Embedding Weaknesses, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM09:2025 Misinformation, reviewed June 25, 2026.
OWASP GenAI Security Project, LLM10:2025 Unbounded Consumption, reviewed June 25, 2026.

Return to Wiki