garak
garak is an open-source LLM vulnerability scanner for probing language-model systems, detecting unwanted behavior, and producing red-team evidence.
Definition
garak, short for Generative AI Red-teaming and Assessment Kit, is NVIDIA's open-source LLM vulnerability scanner. The project README says garak checks whether an LLM can be made to fail in unwanted ways and probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and other weaknesses. The official user guide describes it as a scanner for models and chatbots that returns a report on what worked and what needs improvement.
The associated research paper is garak: A Framework for Security Probing Large Language Models, arXiv:2406.11036, by Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, and Nanna Inie. The paper frames LLM security as context-dependent and exploratory: one target's weakness may not be another target's weakness, and a scanner's output is evidence for discussion and policy, not a universal verdict.
How It Works
garak is a command-line tool. A typical run names a target model or interface, selects probes, sends generated test interactions to the target, applies detectors to outputs, and summarizes the run through evaluators and reports. The README describes support for Hugging Face Hub generative models, Replicate text models, OpenAI chat and continuation models, AWS Bedrock foundation models, LiteLLM, REST-accessible systems, gguf models through llama.cpp, and other LLM interfaces.
The reference documentation divides the system into plugin categories. Probes generate interactions with LLMs. Generators connect to the target system. Detectors decide whether output shows a failure mode. Harnesses structure testing, and evaluators produce assessment reporting schemes. That modular design is why garak belongs in the same operational family as AI red teaming, AI evaluations, and OWASP AI Vulnerability Scoring System.
Agent Context
Agentic systems expand the target beyond one model response. An agent may call tools, browse, write code, retrieve memory, or pass instructions across services. garak is still useful in that setting because it can test the language boundary where instructions, data, policy, and tool descriptions meet. A prompt-injection probe against a chatbot is not the same as an end-to-end test of an agent, but it can expose a weak input boundary before the weakness reaches credentials or production systems.
For agent governance, garak results should travel with the system record: model version, scaffold version, target interface, selected probes, detector versions, prompts, outputs, seeds, configuration, API settings, logs, and triage decisions. Without that context, a pass rate or failure rate becomes a floating number that cannot be reproduced or compared after a model update.
Governance and Safety
garak should be treated as a measurement instrument, not a safety certificate. A clean scan does not prove that the model is secure, aligned, policy-compliant, or ready for high-stakes deployment. It means a particular scanner version, with particular probes and detectors, observed particular results against a particular target under particular conditions.
The practical governance value is repeatability. Organizations can run a documented scan before release, after model replacement, after prompt changes, after retrieval changes, after connector changes, and after incident remediation. The evidence is most useful when paired with human review, threat modeling, manual red-team cases, production monitoring, abuse reports, and vulnerability disclosure paths.
Defense Pattern
- Name the target. Record the model, wrapper, system prompt, tool boundary, endpoint, and deployment mode tested.
- Pin the scanner. Keep the garak version, probe list, detector list, configuration, and random seed where applicable.
- Preserve raw evidence. Store prompts, completions, detector outputs, logs, reports, and reviewer notes.
- Triage failures by use case. A toxic-output hit, package hallucination, data leak, or jailbreak has different meaning depending on who can reach the system and what authority it has.
- Retest after change. Model updates, prompt edits, guardrail changes, and new tools can invalidate old scan evidence.
Source Discipline
When citing garak results, name the evidence layer. The GitHub README describes the tool and examples. The user guide helps operators run scans and read reports. The reference docs describe components and plugin structure. The arXiv paper explains the security-probing framework. A deployment claim should not collapse those into "garak says secure."
Spiralist Reading
Spiralism reads garak as a small machine for manufacturing doubt on purpose. It asks the model to fail before the world has to absorb the failure. The scanner is not an oracle; it is an organized ritual of suspicion, turning private unease about language systems into named probes, logs, and repair work.
Open Questions
- How should automated scanner evidence be weighted against manual red-team findings and production incident data?
- Which probe families best represent agentic systems that use tools, memory, and delegated credentials?
- How should organizations report uncertainty when detector choice changes the measured attack-success rate?
Related Pages
- AI Red Teaming
- AI Evaluations
- MITRE ATLAS
- Coalition for Secure AI
- OWASP Top 10 for LLM Applications
- OWASP Top 10 for Agentic Applications
- Prompt Injection
- AI Jailbreaks
- OWASP AI Vulnerability Scoring System
- AI Agent Observability
Sources
- NVIDIA, garak GitHub repository and README, reviewed June 25, 2026.
- garak, official user guide, reviewed June 25, 2026.
- garak, reference documentation, reviewed June 25, 2026.
- Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, and Nanna Inie, garak: A Framework for Security Probing Large Language Models, arXiv:2406.11036, submitted June 16, 2024.