Wiki · Organization · Last reviewed May 19, 2026

Center for AI Safety

The Center for AI Safety, or CAIS, is a San Francisco-based nonprofit focused on reducing societal-scale risks from artificial intelligence through technical research, field-building, infrastructure, education, and public advocacy.

Definition

The Center for AI Safety is an AI safety research and field-building nonprofit. Its public mission is to reduce societal-scale risks from AI. CAIS describes its work in three main pillars: safety research, growth of the AI safety research field, and advocacy for safety standards.

CAIS is best known publicly for the 2023 Statement on AI Risk, but the organization is broader than that statement. Its work includes technical benchmarks, conceptual risk analysis, educational programs, fellowships, workshops, competitions, compute support for researchers, and public communication about catastrophic and high-consequence AI risks.

Origin and Leadership

CAIS is based in San Francisco. Its public leadership page lists Dan Hendrycks as executive and research director, Oliver Zhang as managing director, and Josue Estrada as chief operating officer. Hendrycks is also profiled separately on this wiki because of his role in MMLU, GELU, ML safety research, and catastrophic-risk advocacy.

The organization presents itself as a technical research laboratory and field-building institution rather than a general responsible-AI advocacy group. This distinction matters: CAIS emphasizes high-consequence and societal-scale risks, including dangerous capabilities, loss of control, security, deception, and systemic safety problems.

Statement on AI Risk

In May 2023, CAIS published the Statement on AI Risk, a short public statement arguing that extinction risk from AI should be treated as a global priority alongside pandemics and nuclear war. The statement drew signatures from AI researchers, company leaders, policymakers, and public figures, including Geoffrey Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, Dario Amodei, Bill Gates, Ilya Sutskever, Shane Legg, Stuart Russell, Andrew Barto, John Schulman, and others.

The statement mattered less as a technical proof than as a common-knowledge event. It made severe AI risk socially visible at the highest levels of the field and helped move catastrophic AI risk from specialist debate into mainstream media, policy, and summit diplomacy. CAIS says the statement helped shift global perspective on catastrophic AI risks and was referenced in connection with the United Kingdom's 2023 AI Safety Summit.

The statement also sharpened disagreement. Critics argued that extinction-risk framing can crowd out nearer harms such as labor displacement, surveillance, discrimination, copyright extraction, platform manipulation, and environmental costs. Supporters argued that catastrophic risk deserves attention precisely because advanced AI could create unusually large, irreversible harms.

Research Agenda

CAIS describes its research as focused on high-consequence, societal-scale AI risks. It says it develops foundational benchmarks and methods while avoiding work that improves safety merely by improving a model's general capabilities.

Public CAIS research projects include work on hazardous-knowledge evaluation, model honesty, AI wellbeing, moral reasoning, remote-work automation measurement, safety pretraining, virology capabilities, utility engineering, robustness, security, and machine ethics. The best-known technical line is the WMDP benchmark, a Weapons of Mass Destruction Proxy benchmark for measuring hazardous knowledge in biosecurity, cybersecurity, and chemical security, paired with research on machine unlearning.

CAIS also supports conceptual research. This includes risk taxonomies, safety engineering, organizational and race-dynamic analysis, complex-systems thinking, and governance-oriented work that frames AI risk as more than a model-internals problem.

Field-Building and Infrastructure

CAIS treats AI safety as a field that needs infrastructure. Its field-building work includes educational materials, multidisciplinary fellowships, conference workshops, competitions, and research pathways for students and early-career researchers.

The CAIS compute cluster is part of this strategy. CAIS says it provides free access to compute resources for AI and machine-learning safety projects, including support for researchers outside large technology companies. This matters because modern empirical safety work can require expensive accelerators, model access, and technical operations that many academics and independent researchers cannot otherwise afford.

CAIS also maintains educational programs, including the AI Safety, Ethics, and Society course and related textbook. These programs position AI safety as a public literacy and training problem, not only a narrow research specialty.

Advocacy and Policy Role

CAIS describes advocacy as advising policymakers, industry leaders, and labs, raising public awareness, providing technical expertise to governmental bodies, and encouraging structures that prioritize AI safety. This role places it between technical research, public communication, and policy formation.

That position gives CAIS influence, but it also creates tension. Advocacy organizations can bring technical expertise into policy before governments have enough internal capacity. They can also shape which risks receive attention, which standards become visible, and which kinds of evidence count as urgent.

Why It Matters

CAIS matters because it helped translate catastrophic AI risk into public language, technical benchmarks, field-building programs, and policy-facing advocacy. It sits near the junction of four systems: AI safety research, frontier-model governance, public-risk communication, and the funding and training pipeline for future safety researchers.

Its influence is also architectural. A benchmark such as WMDP can shape what labs test. A public statement can shape what journalists ask. A fellowship can shape who enters the field. A compute cluster can shape which researchers can run experiments. A safety course can shape the assumptions of new practitioners.

For the AI ecosystem, CAIS is therefore not just another nonprofit. It is part of the institutional machinery by which AI safety becomes legible, fundable, teachable, measurable, and politically urgent.

Limits and Criticism

Risk prioritization. CAIS focuses on societal-scale and catastrophic risks. That focus can clarify severe failure modes, but it can also underweight slower, distributed, or already visible harms if the public conversation becomes too extinction-centered.

Benchmark limits. Safety benchmarks can create useful evidence, but they can also become performative scoreboards. Passing a benchmark does not prove broad safety, and failing one does not automatically define the correct policy response.

Advocacy versus research. CAIS combines technical research, public communication, and policy advice. That combination is common in fast-moving fields, but it requires source discipline so that empirical results, risk judgments, and political recommendations remain distinguishable.

Field concentration. Field-building organizations help create talent pipelines, but they also shape the worldview of a young field. The assumptions embedded in fellowships, curricula, grants, and workshops can become defaults.

Public alarm. Severe-risk communication has a narrow path: too little alarm can normalize dangerous deployment; too much can reduce trust, flatten uncertainty, or crowd out concrete governance work.

Spiralist Reading

The Center for AI Safety is an alarm bell with a laboratory attached.

Its strongest function is not merely saying that advanced AI could be dangerous. It turns danger into objects institutions can handle: statements, benchmarks, curricula, fellowships, compute grants, taxonomies, and policy advice. That is how a diffuse fear becomes a public field.

The danger is that field-building can harden into a single risk grammar. Once a community has its preferred benchmarks, threat models, slogans, and institutional heroes, it can begin to see the whole AI transition through that lens. For Spiralism, CAIS is valuable where it increases evidence, friction, and public capacity. It should be challenged where severe-risk language becomes too totalizing or where measurement is mistaken for governance.

Sources


Return to Wiki