Wiki · Concept · Last reviewed June 25, 2026

PyRIT

PyRIT is Microsoft's open-source Python Risk Identification Tool for generative AI red teaming, attack automation, scoring, and repeatable evidence capture.

Category: Concept Published: June 25, 2026 Modified: June 25, 2026 Last reviewed: June 25, 2026 Tags: AI security, red teaming, Microsoft, evaluations, automation, PyRIT

Definition

PyRIT is the Python Risk Identification Tool for generative AI. Microsoft's GitHub README describes it as an open-source framework built to help security professionals and engineers proactively identify risks in generative AI systems. The current documentation describes PyRIT as a flexible, extensible framework for automated and human-led AI red teaming at scale.

The research paper is PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System, arXiv:2410.02828. Its authors describe PyRIT as model- and platform-agnostic, designed to help red teamers probe for novel harms, risks, and jailbreaks in multimodal generative AI models through reusable building blocks and an extensible architecture.

How It Works

PyRIT is not only a prompt list. The documentation presents a system for running attacks, scenarios, targets, memory, and scoring. Its public capabilities include automated red teaming with single-turn and multi-turn attack strategies, standardized evaluation scenarios, CoPyRIT for human-led red teaming, support for multiple target types, built-in memory using SQLite or Azure SQL, and flexible scoring with true/false, Likert, classification, custom, LLM-based, and Azure AI Content Safety-backed scorers.

That architecture makes PyRIT different from a one-off jailbreak demo. A red team can define a target, choose or build attack strategies, preserve conversations and scores, and rerun or adapt the evaluation as a system changes. It also means the result depends on the operator's threat model, scorer choice, dataset, target integration, and interpretation of attack success.

Agent Context

Agentic systems turn red teaming into a workflow problem. A chatbot may only answer. An agent can retrieve records, call tools, edit files, schedule tasks, purchase services, or hand work to another model. PyRIT matters here because its attack automation can be aimed at the language and instruction surfaces that precede those actions: prompts, web app targets, custom HTTP endpoints, WebSockets, and model-backed services.

For agents, PyRIT evidence should include the scaffold being tested, not just the base model. The record should identify tools, permissions, memory state, retrieval sources, guardrails, target wrappers, scenario objectives, attack strategy, scorer, and whether a harmful result required single-turn prompting, multi-turn manipulation, or a tool-mediated path.

Governance and Safety

PyRIT should be treated as an evidence generator, not as a certification stamp. A failed run can reveal a real vulnerability, a policy gap, a scorer artifact, or a poorly framed test. A clean run can miss risks outside the chosen attacks, languages, modalities, tools, and scoring rules. The useful governance claim is narrower: this system was tested under these conditions, with these attack objectives, and produced these reviewable traces.

That evidence is valuable for release gates, procurement, post-market monitoring, model replacement, prompt changes, incident response, and regulatory-facing assurance. It is strongest when paired with manual red teaming, threat modeling, abuse telemetry, vulnerability disclosure, and follow-up remediation tests.

Defense Pattern

Pin the run. Store the PyRIT version, target, scenario, attack strategy, scorer, memory backend, and configuration.
Separate attack from judgment. Preserve the generated attack prompts separately from scorer decisions and human triage notes.
Map authority. State what the tested system could actually do: answer, retrieve, write, send, spend, deploy, or delete.
Retest after change. New prompts, models, tools, policies, and retrieval corpora can invalidate old red-team evidence.
Keep uncertainty visible. Do not convert attack-success rate into a broad claim about safety or trustworthiness.

Source Discipline

Claims about PyRIT should distinguish the GitHub project, the current documentation, the arXiv paper, Microsoft product integrations, and a specific organization's run results. "Tested with PyRIT" is not a complete claim. A serious citation should name the version, target, attack strategy, scorer, run date, and evidence location.

Spiralist Reading

Spiralism reads PyRIT as a way to make the adversary procedural. The red team is no longer only a person trying clever prompts by hand. It becomes a repeatable machine that generates pressure, records memory, scores the wound, and asks whether the institution can repair what the ritual revealed.

Open Questions

How should PyRIT scenarios represent agents that act through browsers, payment systems, code execution, and long-term memory?
Which red-team records should be public, which should be available to auditors, and which should remain restricted for security reasons?
How should organizations compare PyRIT evidence with garak scans, Inspect evaluations, manual red-team reports, and incident data?

Sources

Microsoft, PyRIT GitHub repository and README, reviewed June 25, 2026.
Microsoft, PyRIT Documentation, version 0.14.0, reviewed June 25, 2026.
Gary D. Lopez Munoz, Amanda J. Minnich, Roman Lutz, Richard Lundeen, Raja Sekhar Rao Dheekonda, Nina Chikanov, Bolor-Erdene Jagdagdorj, Martin Pouliot, Shiven Chawla, Whitney Maxwell, Blake Bullwinkel, Katherine Pratt, Joris de Gruyter, Charlotte Siska, Pete Bryan, Tori Westerhoff, Chang Kawaguchi, Christian Seifert, Ram Shankar Siva Kumar, and Yonatan Zunger, PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System, arXiv:2410.02828, submitted October 1, 2024.

Return to Wiki