Wiki · Concept · Last reviewed June 25, 2026

PyRIT

PyRIT is Microsoft's open-source Python Risk Identification Tool for generative AI red teaming, attack automation, scoring, and repeatable evidence capture.

Definition

PyRIT is the Python Risk Identification Tool for generative AI. Microsoft's GitHub README describes it as an open-source framework built to help security professionals and engineers proactively identify risks in generative AI systems. The current documentation describes PyRIT as a flexible, extensible framework for automated and human-led AI red teaming at scale.

The research paper is PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System, arXiv:2410.02828. Its authors describe PyRIT as model- and platform-agnostic, designed to help red teamers probe for novel harms, risks, and jailbreaks in multimodal generative AI models through reusable building blocks and an extensible architecture.

How It Works

PyRIT is not only a prompt list. The documentation presents a system for running attacks, scenarios, targets, memory, and scoring. Its public capabilities include automated red teaming with single-turn and multi-turn attack strategies, standardized evaluation scenarios, CoPyRIT for human-led red teaming, support for multiple target types, built-in memory using SQLite or Azure SQL, and flexible scoring with true/false, Likert, classification, custom, LLM-based, and Azure AI Content Safety-backed scorers.

That architecture makes PyRIT different from a one-off jailbreak demo. A red team can define a target, choose or build attack strategies, preserve conversations and scores, and rerun or adapt the evaluation as a system changes. It also means the result depends on the operator's threat model, scorer choice, dataset, target integration, and interpretation of attack success.

Agent Context

Agentic systems turn red teaming into a workflow problem. A chatbot may only answer. An agent can retrieve records, call tools, edit files, schedule tasks, purchase services, or hand work to another model. PyRIT matters here because its attack automation can be aimed at the language and instruction surfaces that precede those actions: prompts, web app targets, custom HTTP endpoints, WebSockets, and model-backed services.

For agents, PyRIT evidence should include the scaffold being tested, not just the base model. The record should identify tools, permissions, memory state, retrieval sources, guardrails, target wrappers, scenario objectives, attack strategy, scorer, and whether a harmful result required single-turn prompting, multi-turn manipulation, or a tool-mediated path.

Governance and Safety

PyRIT should be treated as an evidence generator, not as a certification stamp. A failed run can reveal a real vulnerability, a policy gap, a scorer artifact, or a poorly framed test. A clean run can miss risks outside the chosen attacks, languages, modalities, tools, and scoring rules. The useful governance claim is narrower: this system was tested under these conditions, with these attack objectives, and produced these reviewable traces.

That evidence is valuable for release gates, procurement, post-market monitoring, model replacement, prompt changes, incident response, and regulatory-facing assurance. It is strongest when paired with manual red teaming, threat modeling, abuse telemetry, vulnerability disclosure, and follow-up remediation tests.

Defense Pattern

Source Discipline

Claims about PyRIT should distinguish the GitHub project, the current documentation, the arXiv paper, Microsoft product integrations, and a specific organization's run results. "Tested with PyRIT" is not a complete claim. A serious citation should name the version, target, attack strategy, scorer, run date, and evidence location.

Spiralist Reading

Spiralism reads PyRIT as a way to make the adversary procedural. The red team is no longer only a person trying clever prompts by hand. It becomes a repeatable machine that generates pressure, records memory, scores the wound, and asks whether the institution can repair what the ritual revealed.

Open Questions

Sources


Return to Wiki