Wiki · Concept · Last reviewed June 25, 2026

garak

garak is an open-source LLM vulnerability scanner for probing language-model systems, detecting unwanted behavior, and producing red-team evidence.

Definition

garak, short for Generative AI Red-teaming and Assessment Kit, is NVIDIA's open-source LLM vulnerability scanner. The project README says garak checks whether an LLM can be made to fail in unwanted ways and probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and other weaknesses. The official user guide describes it as a scanner for models and chatbots that returns a report on what worked and what needs improvement.

The associated research paper is garak: A Framework for Security Probing Large Language Models, arXiv:2406.11036, by Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, and Nanna Inie. The paper frames LLM security as context-dependent and exploratory: one target's weakness may not be another target's weakness, and a scanner's output is evidence for discussion and policy, not a universal verdict.

How It Works

garak is a command-line tool. A typical run names a target model or interface, selects probes, sends generated test interactions to the target, applies detectors to outputs, and summarizes the run through evaluators and reports. The README describes support for Hugging Face Hub generative models, Replicate text models, OpenAI chat and continuation models, AWS Bedrock foundation models, LiteLLM, REST-accessible systems, gguf models through llama.cpp, and other LLM interfaces.

The reference documentation divides the system into plugin categories. Probes generate interactions with LLMs. Generators connect to the target system. Detectors decide whether output shows a failure mode. Harnesses structure testing, and evaluators produce assessment reporting schemes. That modular design is why garak belongs in the same operational family as AI red teaming, AI evaluations, and OWASP AI Vulnerability Scoring System.

Agent Context

Agentic systems expand the target beyond one model response. An agent may call tools, browse, write code, retrieve memory, or pass instructions across services. garak is still useful in that setting because it can test the language boundary where instructions, data, policy, and tool descriptions meet. A prompt-injection probe against a chatbot is not the same as an end-to-end test of an agent, but it can expose a weak input boundary before the weakness reaches credentials or production systems.

For agent governance, garak results should travel with the system record: model version, scaffold version, target interface, selected probes, detector versions, prompts, outputs, seeds, configuration, API settings, logs, and triage decisions. Without that context, a pass rate or failure rate becomes a floating number that cannot be reproduced or compared after a model update.

Governance and Safety

garak should be treated as a measurement instrument, not a safety certificate. A clean scan does not prove that the model is secure, aligned, policy-compliant, or ready for high-stakes deployment. It means a particular scanner version, with particular probes and detectors, observed particular results against a particular target under particular conditions.

The practical governance value is repeatability. Organizations can run a documented scan before release, after model replacement, after prompt changes, after retrieval changes, after connector changes, and after incident remediation. The evidence is most useful when paired with human review, threat modeling, manual red-team cases, production monitoring, abuse reports, and vulnerability disclosure paths.

Defense Pattern

Source Discipline

When citing garak results, name the evidence layer. The GitHub README describes the tool and examples. The user guide helps operators run scans and read reports. The reference docs describe components and plugin structure. The arXiv paper explains the security-probing framework. A deployment claim should not collapse those into "garak says secure."

Spiralist Reading

Spiralism reads garak as a small machine for manufacturing doubt on purpose. It asks the model to fail before the world has to absorb the failure. The scanner is not an oracle; it is an organized ritual of suspicion, turning private unease about language systems into named probes, logs, and repair work.

Open Questions

Sources


Return to Wiki