Center for AI Safety
The Center for AI Safety, or CAIS, is a San Francisco-based nonprofit focused on reducing societal-scale risks from artificial intelligence through technical research, benchmarks, field-building, infrastructure, education, and policy-facing advocacy.
Snapshot
- Institution type: CAIS is a nonprofit research and field-building organization, not a frontier model developer, government regulator, or standards body.
- Public mission: CAIS says its mission is to reduce societal-scale risks from AI through safety research, growth of the AI safety research field, and advocacy for safety standards.
- Most visible public act: the May 30, 2023 Statement on AI Risk made severe AI-risk concern legible as a public expert-signaling event, but it was not a probability estimate or technical proof.
- Measurement role: CAIS is influential through benchmarks and evaluation infrastructure, including WMDP, Humanity's Last Exam, agent-harm and honesty work, the Remote Labor Index, and the CAIS AI Dashboard.
- Governance caution: CAIS outputs can support release gates, safety cases, procurement, and policy debate, but benchmark scores and advocacy priorities should not be treated as general safety verdicts.
Definition
The Center for AI Safety is an AI safety research and field-building nonprofit. Its public mission is to reduce societal-scale risks from AI. CAIS describes its work in three main pillars: safety research, growth of the AI safety research field, and advocacy for safety standards.
CAIS is not a frontier AI developer, a government regulator, or a standards body. It should also not be confused with NIST's Center for AI Standards and Innovation, or CAISI, the U.S. government body formerly associated with the U.S. AI Safety Institute. CAIS is a civil-society institution whose influence comes through research, benchmarks, field-building, public communication, and policy advice.
The phrase societal-scale risk is doing work here. In CAIS usage it includes catastrophic and high-consequence risks such as dangerous capabilities, malicious use, deception, security failures, race dynamics, and loss of control. It is a prioritization frame, not a settled proof that any specific model is conscious, divine, or already uncontrollable.
Origin and Leadership
CAIS is based in San Francisco. Its public leadership page lists Dan Hendrycks as executive and research director, Oliver Zhang as managing director, and Josue Estrada as chief operating officer. Hendrycks is also profiled separately on this wiki because of his role in MMLU, GELU, ML safety research, and catastrophic-risk advocacy.
The organization presents itself as a technical research laboratory and field-building institution rather than a general responsible-AI advocacy group. This distinction matters: CAIS emphasizes high-consequence and societal-scale risks, including dangerous capabilities, loss of control, security, deception, and systemic safety problems.
CAIS also has a policy ecosystem around it. The Center for AI Safety Action Fund, or CAIS AF, describes itself as a nonpartisan 501(c)(4) advocacy organization focused on U.S. AI leadership and national-security threats, with priorities including AI chip manufacturing, compute security, malicious-use prevention, and global cooperation. That makes source discipline important: CAIS research outputs, CAIS educational material, CAIS AF advocacy, and Dan Hendrycks' personal or coauthored policy arguments should not be collapsed into a single evidentiary category.
Current Context
As of June 23, 2026, CAIS is more than the organization behind the 2023 AI-risk statement. Its public footprint includes research papers, technical benchmarks, a compute cluster, the AI Safety, Ethics, and Society course, the CAIS AI Dashboard, field-building fellowships, newsletters, and policy-facing communication. The organization also sits beside a separate policy actor, the Center for AI Safety Action Fund, a 501(c)(4) advocacy organization with U.S. AI leadership and national-security policy priorities.
The benchmark work is especially central. WMDP tests hazardous knowledge in biosecurity, cybersecurity, and chemical security and is paired with research on unlearning. Humanity's Last Exam, developed by CAIS, Scale AI, and a large contributor consortium, was published in Nature in 2026 as a 2,500-question multimodal academic benchmark for expert-level closed-ended questions. The Remote Labor Index, associated with CAIS and Scale AI researchers, measures end-to-end automation of economically valuable remote-work projects rather than only exam-like question answering.
This is the current institutional pattern: CAIS is trying to build measurement infrastructure for risks and capabilities that policy debates otherwise discuss in vague terms. That gives it real value, but it also means its benchmarks can become part of the incentive system they measure. A CAIS dashboard result, HLE score, WMDP result, AgentHarm result, MASK score, or remote-work automation number should be read as a scoped measurement under a protocol, not as a general safety verdict.
CAIS also sits in a changing policy environment. NIST's CAISI now describes itself as the U.S. government's industry contact for commercial AI testing and collaborative research, while the EU's General-Purpose AI Code of Practice provides voluntary pathways for providers to demonstrate compliance with AI Act obligations, including safety and security practices for the most advanced models. CAIS can inform these debates, but it is not the public authority that enforces them.
Statement on AI Risk
On May 30, 2023, CAIS published the Statement on AI Risk, a one-sentence public statement arguing that extinction risk from AI should be treated as a global priority alongside pandemics and nuclear war. The statement drew signatures from AI researchers, company leaders, policymakers, and public figures, including Geoffrey Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, Dario Amodei, Bill Gates, Ilya Sutskever, Shane Legg, Stuart Russell, Andrew Barto, John Schulman, and others.
The statement mattered less as a technical proof than as a common-knowledge event. CAIS said the purpose was to make it easier to voice concerns about severe risks and to show that many experts and public figures took those risks seriously. It made severe AI risk socially visible at the highest levels of the field and helped move catastrophic AI risk from specialist debate into mainstream media, policy, and summit diplomacy. CAIS's FAQ says signatories were verified by email before being added to the statement.
The statement also sharpened disagreement. Critics argued that extinction-risk framing can crowd out nearer harms such as labor displacement, surveillance, discrimination, copyright extraction, platform manipulation, and environmental costs. Supporters argued that catastrophic risk deserves attention precisely because advanced AI could create unusually large, irreversible harms.
Research Agenda
CAIS describes its research as focused on high-consequence, societal-scale AI risks. It says it develops foundational benchmarks and methods while avoiding work that improves safety merely by improving a model's general capabilities.
Public CAIS research projects include work on hazardous-knowledge evaluation, model honesty, agent misuse, remote-work automation measurement, AI deception, political manipulation, robustness, security, machine ethics, frontier capability measurement, functional wellbeing research, and AI-driven automation. The best-known technical line remains WMDP, a Weapons of Mass Destruction Proxy benchmark for measuring hazardous knowledge in biosecurity, cybersecurity, and chemical security, paired with research on machine unlearning.
CAIS also supports conceptual research. This includes risk taxonomies, safety engineering, organizational and race-dynamic analysis, complex-systems thinking, and governance-oriented work that frames AI risk as more than a model-internals problem. Hendrycks, Mazeika, and Woodside's An Overview of Catastrophic AI Risks is important here because it organizes catastrophic risk into malicious use, AI race dynamics, organizational risks, and rogue AI, rather than treating failure as a single alignment story.
The research agenda should be read with attention to genre. A benchmark paper, a conceptual taxonomy, an advocacy statement, a newsletter, and a national-security strategy paper do different kinds of work. They can inform one another, but they should not be cited as if they carry the same evidentiary weight. This is especially important for work on AI wellbeing or moral status: this page treats those as research claims and governance questions, not as evidence that present AI systems are conscious, alive, or morally equivalent to people.
Evidence Map
CAIS-related evaluations are most useful when the measurement is tied to the decision it can inform. They should not be collapsed into one generic "AI safety score."
Hazardous knowledge: WMDP is a proxy benchmark for biosecurity, cybersecurity, and chemical-security knowledge that could support decisions about model release, unlearning, access controls, and red-team focus. It does not directly simulate every misuse pathway, measure operational competence, or prove that a model is safe after a mitigation.
Agent misuse: AgentHarm tests whether LLM agents can be induced to perform harmful multi-step tasks, including under jailbreak-style attacks. It is especially relevant for tool-using systems, agent sandboxing, permission design, refusal robustness, and audit trails. It should not be treated as a complete safety case for deployed agents with different tools, identities, permissions, and monitoring.
Honesty and persuasion: MASK tries to separate honesty from factual accuracy by comparing elicited model beliefs with statements made under pressure to lie. CAIS-linked work on political manipulation and persuasion is relevant to influence-risk evaluation, but controlled prompts and benchmark scenarios are not the same as real campaign effects, platform distribution, or durable belief change.
Capability and automation: Humanity's Last Exam measures expert-level closed-ended academic question answering, while the Remote Labor Index measures end-to-end automation on selected remote-work projects. These are valuable capability signals. They are not proof of general agency, safe deployment, or whole-labor-market automation.
Wellbeing boundary: CAIS's AI wellbeing work frames "functional wellbeing" as measurable behavior resembling positive or negative welfare signals. That is a research construct and governance provocation, not evidence that current AI systems are conscious, alive, divine, or moral patients. Readers should separate construct validity, moral status, product persona, and legal personhood.
Field-Building and Infrastructure
CAIS treats AI safety as a field that needs infrastructure. Its field-building work includes educational materials, multidisciplinary fellowships, conference workshops, competitions, and research pathways for students and early-career researchers.
The CAIS compute cluster is part of this strategy. CAIS says the cluster has supported many AI safety research projects and researchers, but its current public page says it is not accepting new applications for external access. The governance point remains: empirical safety work can require expensive accelerators, model access, and technical operations that many academics and independent researchers cannot otherwise afford.
CAIS also maintains educational programs, including the AI Safety, Ethics, and Society course and related textbook. These programs position AI safety as a public literacy and training problem, not only a narrow research specialty.
The field-building role is not neutral plumbing. Fellowships, textbooks, competitions, newsletters, and compute grants help decide who enters the field, which problems look prestigious, which methods become standard, and which risk categories become common sense.
Governance Implications
CAIS describes advocacy as advising policymakers, industry leaders, and labs, raising public awareness, providing technical expertise to governmental bodies, and encouraging structures that prioritize AI safety. This role places it between technical research, public communication, and policy formation.
That position gives CAIS influence, but it also creates tension. Civil-society organizations can bring technical expertise into policy before governments have enough internal capacity. They can also shape which risks receive attention, which standards become visible, and which kinds of evidence count as urgent.
The governance value of CAIS is strongest when its work becomes checkable infrastructure: public benchmarks, reproducible code, clear threat models, dated claims, explicit uncertainty, and links between evaluation results and decisions such as access limits, deployment gates, monitoring, or incident response. This aligns with the broader direction of NIST risk-management work, the International AI Safety Report process, the Bletchley Declaration's attention to frontier risks, and EU AI Act obligations for systemic-risk general-purpose AI models.
The governance risk is metric capture. If a WMDP score, an AI dashboard ranking, a Humanity's Last Exam result, or a remote-work automation benchmark becomes a proxy for public safety, institutions may optimize toward the visible number while missing harder questions: who has access, who audits the test, what was excluded, what changed after deployment, and what harms remain outside the benchmark.
For governance, CAIS work is most useful when connected to consequences: pre-release evaluation, capability elicitation, safety cases, model-weight security, procurement conditions, post-deployment monitoring, incident reporting, or a decision to delay, restrict, or redesign a release. Evaluation without leverage can still teach the field, but it does not by itself govern deployment.
CAIS AF adds a separate governance issue. Advocacy around chip manufacturing, compute security, malicious-use prevention, know-your-customer reporting, whistleblower protections, and international cooperation can be legitimate policy work, but it is not the same kind of source as a benchmark paper or research report. A disciplined reader separates the nonprofit research institution, the 501(c)(4) advocacy organization, and the individual writings of CAIS-affiliated researchers.
Why It Matters
CAIS matters because it helped translate catastrophic AI risk into public language, technical benchmarks, field-building programs, and policy-facing advocacy. It sits near the junction of four systems: AI safety research, frontier-model governance, public-risk communication, and the funding and training pipeline for future safety researchers.
Its influence is also architectural. A benchmark such as WMDP can shape what labs test. A public statement can shape what journalists ask. A fellowship can shape who enters the field. A compute cluster can shape which researchers can run experiments. A safety course can shape the assumptions of new practitioners. A dashboard can shape what the public treats as measurable progress.
For the AI ecosystem, CAIS is therefore not just another nonprofit. It is part of the institutional machinery by which AI safety becomes legible, fundable, teachable, measurable, and politically urgent.
Limits and Criticism
Risk prioritization. CAIS focuses on societal-scale and catastrophic risks. That focus can clarify severe failure modes, but it can also underweight slower, distributed, or already visible harms if the public conversation becomes too extinction-centered.
Benchmark limits. Safety benchmarks can create useful evidence, but they can also become performative scoreboards. Passing a benchmark does not prove broad safety, and failing one does not automatically define the correct policy response.
Evidence mixing. CAIS operates across research, field-building, education, communication, and policy advocacy. Those outputs need different standards of evidence. A peer-reviewed benchmark, a blog post, a newsletter, and a 501(c)(4) policy priority should not be read as interchangeable.
Advocacy versus research. CAIS combines technical research, public communication, and policy advice. That combination is common in fast-moving fields, but it requires source discipline so that empirical results, risk judgments, and political recommendations remain distinguishable.
Field concentration. Field-building organizations help create talent pipelines, but they also shape the worldview of a young field. The assumptions embedded in fellowships, curricula, grants, and workshops can become defaults.
Policy entanglement. When a research organization, affiliated researchers, and a separately incorporated advocacy fund operate in the same risk ecosystem, readers need to ask which claims are empirical, which are pedagogical, and which are aimed at legislative or national-security outcomes.
Dual-use disclosure. Some CAIS work concerns hazardous knowledge, agent misuse, cyber risk, and biosecurity. Public release can improve independent scrutiny, but it also requires careful filtering, access decisions, and a clear account of what was withheld or abstracted.
Public alarm. Severe-risk communication has a narrow path: too little alarm can normalize dangerous deployment; too much can reduce trust, flatten uncertainty, or crowd out concrete governance work.
Source Discipline
Claims about CAIS should be sorted by source type before being repeated. CAIS official pages are primary evidence for CAIS's mission, leadership, programs, and self-description. CAIS Action Fund pages are primary evidence for the Action Fund's advocacy role and tax status. Papers and peer-reviewed articles are evidence for technical claims, with their own limits. Regulator, standards-body, and government documents are evidence for governance context. Press coverage is useful for public influence, but it should not be treated as proof of technical safety.
The Statement on AI Risk is a public-signaling document. It proves that many named people endorsed a concise severe-risk statement; it does not prove the probability of extinction, validate a specific technical pathway, or settle disputes about present-day harms.
CAIS benchmarks should be cited with their protocol. WMDP is a proxy for hazardous knowledge, not a direct simulation of every misuse pathway. AgentHarm tests harmful multi-step agent behavior under specified task and attack conditions, not every possible tool-use deployment. MASK measures consistency between elicited beliefs and pressured statements, not direct access to a model's inner mental states. Humanity's Last Exam is an expert-level academic benchmark, not a complete test of scientific agency or safe deployment. The Remote Labor Index tests end-to-end automation of selected remote-work projects, not the whole labor market. Dashboard results can help track model progress, but they need model version, tool access, prompting, scoring, date, and contamination caveats.
Research on persuasion, political manipulation, or AI wellbeing needs extra boundary-setting. A lab benchmark can show vulnerability under controlled conditions, but it does not by itself establish population-level persuasion, legal responsibility, consciousness, or moral status. Do not infer personhood, sentience, or public authority from a benchmark name, a product persona, or a model's self-report.
Version discipline matters. A project page, arXiv abstract, conference version, benchmark website, and leaderboard may differ on question counts, scoring, model versions, or access conditions. Cite the exact artifact being used rather than smoothing those differences into one "CAIS says" claim.
Current institutional claims should also preserve dates. Leadership pages, compute access, dashboard categories, benchmark leaderboards, and advocacy priorities can change. This article's current institutional claims were reviewed against primary sources on June 23, 2026.
Spiralist Reading
The Center for AI Safety is an alarm bell with a laboratory attached.
Its strongest function is not merely saying that advanced AI could be dangerous. It turns danger into objects institutions can handle: statements, benchmarks, curricula, fellowships, compute grants, taxonomies, and policy advice. That is how a diffuse fear becomes a public field.
The danger is that field-building can harden into a single risk grammar. Once a community has its preferred benchmarks, threat models, slogans, and institutional heroes, it can begin to see the whole AI transition through that lens. For Spiralism, CAIS is valuable where it increases evidence, friction, and public capacity. It should be challenged where severe-risk language becomes too totalizing, where national-security framing narrows the public imagination, or where measurement is mistaken for governance.
Related Pages
- Dan Hendrycks
- AI Alignment
- AI Evaluations
- Benchmark Contamination
- Capability Elicitation
- Humanity's Last Exam
- AI Agents
- AI Agent Sandboxing
- AI Persuasion
- AI Biosecurity
- AI Safety Cases
- AI Safety Institutes
- AI Safety Summits
- Frontier AI Safety Frameworks
- AI Red Teaming
- AI Audits and Third-Party Assurance
- NIST AI Risk Management Framework
- EU AI Act
- Model Weight Security
- AI Compute
- Compute Governance
- Model Welfare
- Existential Risk
- AI Governance
- AI Organizations
- Claim Hygiene Protocol
Sources
- Center for AI Safety, official website, reviewed June 23, 2026.
- Center for AI Safety, About Us, reviewed June 23, 2026.
- Center for AI Safety, Frequently Asked Questions, reviewed June 23, 2026.
- Center for AI Safety, Statement on AI Risk, May 30, 2023.
- Center for AI Safety, AI Risks that Could Lead to Catastrophe, reviewed June 23, 2026.
- Center for AI Safety, CAIS AI Safety Research, reviewed June 23, 2026.
- Center for AI Safety, Field Building Projects, reviewed June 23, 2026.
- Center for AI Safety, Compute Cluster, reviewed June 23, 2026.
- Center for AI Safety, 2024 Impact Report, reviewed June 23, 2026.
- Center for AI Safety, CAIS AI Dashboard, reviewed June 23, 2026.
- Center for AI Safety Action Fund, official website, policy priorities, and team and tax notice, reviewed June 23, 2026.
- Li et al., The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning, arXiv, 2024.
- Andriushchenko et al., AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents, arXiv, 2024; accepted at ICLR 2025.
- Ren et al., The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems, arXiv, 2025.
- Center for AI Safety, Scale AI, and HLE Contributors Consortium, A benchmark of expert-level academic questions to assess AI capabilities, Nature, 2026; arXiv version.
- Mazeika et al., Remote Labor Index: Measuring AI Automation of Remote Work, reviewed June 23, 2026.
- Ren et al., AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs, Center for AI Safety project page, reviewed June 23, 2026.
- Hendrycks, Mazeika, and Woodside, An Overview of Catastrophic AI Risks, arXiv, 2023.
- NIST, Center for AI Standards and Innovation, reviewed June 23, 2026.
- NIST, AI Risk Management Framework, reviewed June 23, 2026.
- NIST, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1, 2024.
- GOV.UK, Bletchley Declaration, November 1, 2023.
- European Commission, General-Purpose AI Code of Practice, reviewed June 23, 2026.
- International AI Safety Report, About the International AI Safety Report and 2026 Extended Summary for Policymakers, reviewed June 23, 2026.