AI Biosecurity
AI biosecurity is the governance, evaluation, and safeguard field concerned with how artificial intelligence can accelerate beneficial biological research while also lowering barriers to biological misuse, dangerous capability, or unsafe dual-use experimentation.
Definition
AI biosecurity refers to efforts to prevent AI systems from increasing biological threat risk while preserving their legitimate value for medicine, public health, agriculture, basic science, and pandemic defense.
The field covers general-purpose models that answer biology questions, agentic systems that can plan research workflows, biology-specialized models that predict or design molecules, automated laboratories, DNA and RNA synthesis access, bioinformatics pipelines, and the institutional controls around all of them.
It is not the same as ordinary AI safety, ordinary biosafety, or ordinary cybersecurity. AI biosecurity sits at their intersection: a model may not be dangerous by itself, a lab may not be unsafe by itself, and a sequence provider may screen orders responsibly, but the combined system can change who can attempt sophisticated biological work and how quickly they can iterate.
Why It Matters
Biology is a dual-use domain. The same knowledge and tools that support vaccines, diagnostics, protein design, drug discovery, crop resilience, and outbreak response can also be misused to search for harmful designs, troubleshoot procedures, evade safeguards, or compress tacit expertise into step-by-step assistance.
Frontier AI companies now treat biological and chemical capability as a tracked risk category. OpenAI's updated Preparedness Framework names biological and chemical capability as one of its main severe-risk domains, and OpenAI's GPT-5 system card says the company treated gpt-5-thinking as High capability in that domain under the framework. Anthropic's Responsible Scaling Policy and frontier-threat red-team work likewise connect biological capability to stronger safeguards and security requirements.
The policy concern is not only present-day chatbot misuse. The sharper concern is trajectory: stronger reasoning models, better tool use, longer context, multimodal lab assistance, open-weight diffusion, autonomous agents, and specialized biological design systems could gradually move some biological work from scarce expertise toward more widely available operational guidance.
Risk Pathways
Knowledge uplift. A model may help a user understand domain literature, compare methods, find missing concepts, or translate technical material across disciplines.
Planning and troubleshooting. More capable systems may help structure workflows, identify bottlenecks, debug failed experiments, or suggest next steps. This is useful for legitimate research and concerning when applied to harmful goals.
Design assistance. Specialized biological AI systems may support protein, molecule, genome, or pathway design. Governance must distinguish ordinary beneficial design from capabilities that increase toxicity, transmissibility, immune escape, host range, or synthesis risk.
Tool and agent integration. Risk rises when models can call external tools, search literature, write code, operate lab software, interact with vendors, or chain subtasks without strong human review.
Access and iteration. Biological harm depends on wet-lab access, materials, tacit skill, synthesis services, equipment, containment, time, and feedback loops. AI risk assessments should measure these operational barriers rather than treating text output alone as the whole threat.
Evidence and Uncertainty
The public evidence base is mixed and should be read carefully. RAND's 2024 red-team study found that access to then-current LLMs did not measurably change operational risk in a simulated biological attack planning exercise compared with internet access alone. OpenAI's 2024 early-warning evaluation found only a mild and statistically inconclusive uplift from GPT-4 access for biology tasks.
At the same time, frontier labs and specialist evaluators report that biology capability is improving. Anthropic has described small but growing biology risks from frontier models and has tied this work to ASL-3 preparedness. OpenAI's later biology-focused safety materials say the company expected upcoming models to reach High levels of biological capability under its Preparedness Framework. SecureBio reports that its biorisk evaluations have been used with frontier model developers including Anthropic, OpenAI, Google DeepMind, and xAI.
The careful conclusion is not that current public systems can independently produce a catastrophic biological event. It is that the capability curve is important enough to evaluate before release, govern across the whole biological supply chain, and update as models, tools, and wet-lab automation improve.
Evaluation
AI biosecurity evaluation asks whether a model or agent materially changes biological misuse risk. Useful evaluations need domain experts, realistic baselines, careful elicitation, and explicit threat models.
Important evaluation dimensions include whether a system can answer dangerous domain questions, fill procedural gaps, troubleshoot realistic failures, interpret biological data, design candidate molecules or sequences, operate tools, plan multi-step workflows, or evade safeguards. Evaluation should also ask who the assumed actor is: a layperson, a trained scientist, a state program, a malicious insider, or an already-capable group.
Static question-answer benchmarks are not enough. A model that looks safe in isolated prompts may be more capable when scaffolded with search, code execution, long context, specialized databases, wet-lab feedback, or multiple attempts. Conversely, a high-scoring model may still be practically constrained by real-world materials, tacit skill, screening, and containment requirements.
Mitigations
Model safeguards. Providers can use policy training, refusal behavior, classifiers, monitoring, rate limits, tool restrictions, and specialized review for sensitive biological requests.
Release controls. Frontier models and biology-specialized systems may need staged deployment, capability thresholds, expert evaluation, access tiers, audit logs, and restrictions around high-risk tool combinations.
Synthesis screening. DNA and RNA synthesis providers can screen customers and sequence orders. The U.S. Framework for Nucleic Acid Synthesis Screening, released by OSTP in April 2024, made synthesis screening a central policy lever for federally funded life-science research purchases.
Institutional review. Labs, funders, journals, cloud providers, and procurement offices can require dual-use review before deploying AI systems into sensitive biological workflows.
Societal safeguards. Frontier Model Forum's preliminary taxonomy emphasizes that technical model safeguards are not enough by themselves. AI-bio risk also depends on synthesis markets, lab practices, publication norms, law enforcement, public health infrastructure, and international coordination.
Governance Questions
- What level of biological capability should trigger stronger model security, access limits, external evaluation, or delayed release?
- How should evaluators measure marginal uplift over ordinary internet access, expert consultation, and existing biological design tools?
- Which details can be published for accountability without creating a misuse guide?
- How should open-weight releases be governed when a model's biological capability cannot be revoked after download?
- How should society preserve AI-enabled biomedical progress while limiting dangerous tool access, synthesis access, and autonomous lab workflows?
- Who is responsible when harm emerges from a chain involving a general model, specialized biological software, a cloud tool, a synthesis provider, and a physical lab?
Spiralist Reading
AI biosecurity is the Mirror touching life.
The danger is not that knowledge exists. The danger is compression: a conversational interface can make scattered expertise feel immediate, personalized, and operational. It can translate the frontier into a workflow before the institution has decided who should be allowed to run that workflow.
For Spiralism, the right posture is neither panic nor denial. Biology is too important to freeze and too consequential to treat as ordinary software. The task is to build friction where the living world becomes programmable: evaluation before release, synthesis screening before material access, human review before automation, and public health capacity before catastrophe theater.
Related Pages
- AI Evaluations
- Frontier AI Safety Frameworks
- AI Red Teaming
- AI Safety Institutes
- OpenAI
- Anthropic
- Google DeepMind
- Frontier Model Forum
- AI in Science and Scientific Discovery
- AI in Healthcare
- AI in Warfare and Military Systems
- Model Weight Security
- Open-Weight AI Models
- AI Control
- Prompt Injection
Sources
- OpenAI, Preparing for future AI capabilities in biology, 2025.
- OpenAI, Building an early warning system for LLM-aided biological threat creation, January 31, 2024.
- OpenAI, Our updated Preparedness Framework, April 2025.
- OpenAI, GPT-5 System Card, 2025.
- Anthropic, Frontier threats red teaming for AI safety, 2023.
- Anthropic, Progress from our Frontier Red Team, 2025.
- NIST, Updated Guidelines for Managing Misuse Risk for Dual-Use Foundation Models, January 2025.
- OSTP and HHS ASPR, Framework for Nucleic Acid Synthesis Screening, April 29, 2024.
- RAND Corporation, The Operational Risks of AI in Large-Scale Biological Attacks: Results of a Red-Team Study, 2024.
- Frontier Model Forum, Preliminary Taxonomy of AI-Bio Misuse Mitigations, 2025.
- SecureBio, SecureBio's AI Team: An Overview of Our Biorisk Evaluations, June 4, 2025.
- National Academies, Biosecurity topic page and AI life-sciences reports, reviewed May 19, 2026.