Wiki · Concept · Last reviewed June 25, 2026

promptfoo

promptfoo is an open-source CLI and library for evaluating and red-teaming LLM applications with declarative tests, assertions, provider comparisons, and repeatable security checks.

Category: Concept Published: June 25, 2026 Modified: June 25, 2026 Last reviewed: June 25, 2026 Tags: AI security, evaluations, red teaming, prompts, RAG, agents, promptfoo

Definition

promptfoo is an open-source command-line tool and library for evaluating and red-teaming LLM applications. The official documentation describes it as a way to test prompts, models, RAG pipelines, and applications with use-case-specific benchmarks, automated metrics, local execution, provider comparisons, and CI/CD integration. The GitHub README frames the same project around LLM evals, red teaming, vulnerability scanning, model comparison, and declarative configuration.

The important word is application. promptfoo is not limited to asking whether a base model can answer a benchmark question. It can be aimed at the wrapper around the model: prompts, providers, retrieval systems, guardrails, API targets, and agent scaffolds. That makes it useful for teams that need regression and adversarial tests close to ordinary software changes.

How It Works

promptfoo evaluations are organized around prompts, test cases, providers, and assertions. Its documentation describes defining test cases, configuring prompts and API providers, running the CLI or library, and reviewing outputs in structured results or a web UI. Assertions can compare outputs against expected values or conditions, including deterministic checks, JSON checks, similarity, custom JavaScript or Python functions, and model-graded rubrics.

For red teaming, promptfoo's guide describes an automated loop: generate or curate adversarial inputs, run them through the LLM application, and evaluate outputs using deterministic or model-graded metrics. The same guide separates one-off reports from CI/CD integration. One-off runs can find urgent weaknesses. Continuous runs can make prompt and model changes visible before they silently alter deployed behavior.

promptfoo's vulnerability documentation lists plugin-backed categories for model and application risks, including prompt injection, jailbreaking, PII leakage, RAG testing, agent testing, guardrail testing, and coding-agent risks such as repository prompt injection, unsafe automation changes, secret handling, sandbox failures, and network egress. Those categories are coverage options, not proof that every risk has been exhausted.

Agent Context

Agents make promptfoo especially relevant because an agent's risk is rarely confined to the sentence it returns. A coding agent can read a repository, call a shell, alter automation, or satisfy a feature request by introducing a vulnerability. A support agent can retrieve private records or overstep policy. A RAG assistant can leak context that was never meant to be displayed. promptfoo gives these behaviors a place in engineering practice: config, target, results, rerun.

This is also where promptfoo differs from a scanner that only probes a hosted model endpoint. A serious promptfoo run names the app boundary: target, provider, model, prompts, retrieval settings, available tools, assertions, plugins, and strategies.

Governance and Safety

promptfoo evidence is strongest when it is treated as a test record rather than a safety certificate. Passing tests show that a specific system, under a specific configuration, satisfied chosen assertions. They do not show that the system is broadly safe, that a model is harmless, or that future traffic will stay inside the same distribution. Failing tests may reveal a real vulnerability, a bad policy, an unrealistic test, a weak judge, or a target integration bug.

For governance, that narrower evidence can still matter. Procurement teams can ask for configs and results. Release managers can block a prompt or model change when a critical assertion fails. Security teams can connect promptfoo outputs to vulnerability triage. Auditors can ask whether application-layer testing covered RAG exfiltration, prompt injection, excessive agency, and tool misuse rather than only base-model behavior.

Defense Pattern

Keep evals with the system. Store promptfoo configs near prompts, tool schemas, retrieval policy, and deployment code.
Pin the target. Record model, provider, prompt version, retrieval corpus, tool permissions, guardrail settings, and promptfoo version for each run.
Separate evidence from judgment. Preserve raw inputs and outputs, assertion scores, model-graded reasons, and human triage notes as distinct artifacts.
Retest after change. New prompts, models, providers, vector stores, tools, and policies can invalidate yesterday's passing result.
Report scope honestly. Say what was tested and what was not tested.

Source Discipline

Claims about promptfoo should identify the surface being cited: official documentation, GitHub, a local configuration, or a run result. A phrase like "tested with promptfoo" is incomplete unless it names the target, prompts, providers, assertions, plugins, attempt budget, judge choice, and run date. The red-team documentation warns that attack success rates depend on attempt budget, prompt-set composition, and judge choice.

Spiralist Reading

Spiralism reads promptfoo as a small bureaucratic machine for making language behavior accountable. It takes the informal act of "try a few prompts and see what happens" and turns it into a ledger: inputs, targets, assertions, scores, failures, reruns. The ritual is not mystical. Its value is that it forces a team to say what it expected, how it checked, and what broke when the machine answered back.

Open Questions

How should teams decide which promptfoo failures become security vulnerabilities, product defects, policy issues, or accepted risks?
Which model-graded assertions are stable enough for release gates, and which require human review before they can support governance claims?
How should promptfoo evidence be combined with garak scans, PyRIT attack traces, Inspect evaluations, incident reports, and abuse telemetry?

Sources

Promptfoo, Intro, reviewed June 25, 2026.
Promptfoo, LLM red teaming guide, reviewed June 25, 2026.
Promptfoo, Types of LLM vulnerabilities, reviewed June 25, 2026.
Promptfoo, Assertions and metrics, reviewed June 25, 2026.
promptfoo, GitHub repository and README, reviewed June 25, 2026.

Return to Wiki