Wiki · Organization · Last reviewed June 23, 2026

Center for AI Safety

The Center for AI Safety, or CAIS, is a San Francisco-based nonprofit focused on reducing societal-scale risks from artificial intelligence through technical research, benchmarks, field-building, infrastructure, education, and policy-facing advocacy.

Snapshot

Definition

The Center for AI Safety is an AI safety research and field-building nonprofit. Its public mission is to reduce societal-scale risks from AI. CAIS describes its work in three main pillars: safety research, growth of the AI safety research field, and advocacy for safety standards.

CAIS is not a frontier AI developer, a government regulator, or a standards body. It should also not be confused with NIST's Center for AI Standards and Innovation, or CAISI, the U.S. government body formerly associated with the U.S. AI Safety Institute. CAIS is a civil-society institution whose influence comes through research, benchmarks, field-building, public communication, and policy advice.

The phrase societal-scale risk is doing work here. In CAIS usage it includes catastrophic and high-consequence risks such as dangerous capabilities, malicious use, deception, security failures, race dynamics, and loss of control. It is a prioritization frame, not a settled proof that any specific model is conscious, divine, or already uncontrollable.

Origin and Leadership

CAIS is based in San Francisco. Its public leadership page lists Dan Hendrycks as executive and research director, Oliver Zhang as managing director, and Josue Estrada as chief operating officer. Hendrycks is also profiled separately on this wiki because of his role in MMLU, GELU, ML safety research, and catastrophic-risk advocacy.

The organization presents itself as a technical research laboratory and field-building institution rather than a general responsible-AI advocacy group. This distinction matters: CAIS emphasizes high-consequence and societal-scale risks, including dangerous capabilities, loss of control, security, deception, and systemic safety problems.

CAIS also has a policy ecosystem around it. The Center for AI Safety Action Fund, or CAIS AF, describes itself as a nonpartisan 501(c)(4) advocacy organization focused on U.S. AI leadership and national-security threats, with priorities including AI chip manufacturing, compute security, malicious-use prevention, and global cooperation. That makes source discipline important: CAIS research outputs, CAIS educational material, CAIS AF advocacy, and Dan Hendrycks' personal or coauthored policy arguments should not be collapsed into a single evidentiary category.

Current Context

As of June 23, 2026, CAIS is more than the organization behind the 2023 AI-risk statement. Its public footprint includes research papers, technical benchmarks, a compute cluster, the AI Safety, Ethics, and Society course, the CAIS AI Dashboard, field-building fellowships, newsletters, and policy-facing communication. The organization also sits beside a separate policy actor, the Center for AI Safety Action Fund, a 501(c)(4) advocacy organization with U.S. AI leadership and national-security policy priorities.

The benchmark work is especially central. WMDP tests hazardous knowledge in biosecurity, cybersecurity, and chemical security and is paired with research on unlearning. Humanity's Last Exam, developed by CAIS, Scale AI, and a large contributor consortium, was published in Nature in 2026 as a 2,500-question multimodal academic benchmark for expert-level closed-ended questions. The Remote Labor Index, associated with CAIS and Scale AI researchers, measures end-to-end automation of economically valuable remote-work projects rather than only exam-like question answering.

This is the current institutional pattern: CAIS is trying to build measurement infrastructure for risks and capabilities that policy debates otherwise discuss in vague terms. That gives it real value, but it also means its benchmarks can become part of the incentive system they measure. A CAIS dashboard result, HLE score, WMDP result, AgentHarm result, MASK score, or remote-work automation number should be read as a scoped measurement under a protocol, not as a general safety verdict.

CAIS also sits in a changing policy environment. NIST's CAISI now describes itself as the U.S. government's industry contact for commercial AI testing and collaborative research, while the EU's General-Purpose AI Code of Practice provides voluntary pathways for providers to demonstrate compliance with AI Act obligations, including safety and security practices for the most advanced models. CAIS can inform these debates, but it is not the public authority that enforces them.

Statement on AI Risk

On May 30, 2023, CAIS published the Statement on AI Risk, a one-sentence public statement arguing that extinction risk from AI should be treated as a global priority alongside pandemics and nuclear war. The statement drew signatures from AI researchers, company leaders, policymakers, and public figures, including Geoffrey Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, Dario Amodei, Bill Gates, Ilya Sutskever, Shane Legg, Stuart Russell, Andrew Barto, John Schulman, and others.

The statement mattered less as a technical proof than as a common-knowledge event. CAIS said the purpose was to make it easier to voice concerns about severe risks and to show that many experts and public figures took those risks seriously. It made severe AI risk socially visible at the highest levels of the field and helped move catastrophic AI risk from specialist debate into mainstream media, policy, and summit diplomacy. CAIS's FAQ says signatories were verified by email before being added to the statement.

The statement also sharpened disagreement. Critics argued that extinction-risk framing can crowd out nearer harms such as labor displacement, surveillance, discrimination, copyright extraction, platform manipulation, and environmental costs. Supporters argued that catastrophic risk deserves attention precisely because advanced AI could create unusually large, irreversible harms.

Research Agenda

CAIS describes its research as focused on high-consequence, societal-scale AI risks. It says it develops foundational benchmarks and methods while avoiding work that improves safety merely by improving a model's general capabilities.

Public CAIS research projects include work on hazardous-knowledge evaluation, model honesty, agent misuse, remote-work automation measurement, AI deception, political manipulation, robustness, security, machine ethics, frontier capability measurement, functional wellbeing research, and AI-driven automation. The best-known technical line remains WMDP, a Weapons of Mass Destruction Proxy benchmark for measuring hazardous knowledge in biosecurity, cybersecurity, and chemical security, paired with research on machine unlearning.

CAIS also supports conceptual research. This includes risk taxonomies, safety engineering, organizational and race-dynamic analysis, complex-systems thinking, and governance-oriented work that frames AI risk as more than a model-internals problem. Hendrycks, Mazeika, and Woodside's An Overview of Catastrophic AI Risks is important here because it organizes catastrophic risk into malicious use, AI race dynamics, organizational risks, and rogue AI, rather than treating failure as a single alignment story.

The research agenda should be read with attention to genre. A benchmark paper, a conceptual taxonomy, an advocacy statement, a newsletter, and a national-security strategy paper do different kinds of work. They can inform one another, but they should not be cited as if they carry the same evidentiary weight. This is especially important for work on AI wellbeing or moral status: this page treats those as research claims and governance questions, not as evidence that present AI systems are conscious, alive, or morally equivalent to people.

Evidence Map

CAIS-related evaluations are most useful when the measurement is tied to the decision it can inform. They should not be collapsed into one generic "AI safety score."

Hazardous knowledge: WMDP is a proxy benchmark for biosecurity, cybersecurity, and chemical-security knowledge that could support decisions about model release, unlearning, access controls, and red-team focus. It does not directly simulate every misuse pathway, measure operational competence, or prove that a model is safe after a mitigation.

Agent misuse: AgentHarm tests whether LLM agents can be induced to perform harmful multi-step tasks, including under jailbreak-style attacks. It is especially relevant for tool-using systems, agent sandboxing, permission design, refusal robustness, and audit trails. It should not be treated as a complete safety case for deployed agents with different tools, identities, permissions, and monitoring.

Honesty and persuasion: MASK tries to separate honesty from factual accuracy by comparing elicited model beliefs with statements made under pressure to lie. CAIS-linked work on political manipulation and persuasion is relevant to influence-risk evaluation, but controlled prompts and benchmark scenarios are not the same as real campaign effects, platform distribution, or durable belief change.

Capability and automation: Humanity's Last Exam measures expert-level closed-ended academic question answering, while the Remote Labor Index measures end-to-end automation on selected remote-work projects. These are valuable capability signals. They are not proof of general agency, safe deployment, or whole-labor-market automation.

Wellbeing boundary: CAIS's AI wellbeing work frames "functional wellbeing" as measurable behavior resembling positive or negative welfare signals. That is a research construct and governance provocation, not evidence that current AI systems are conscious, alive, divine, or moral patients. Readers should separate construct validity, moral status, product persona, and legal personhood.

Field-Building and Infrastructure

CAIS treats AI safety as a field that needs infrastructure. Its field-building work includes educational materials, multidisciplinary fellowships, conference workshops, competitions, and research pathways for students and early-career researchers.

The CAIS compute cluster is part of this strategy. CAIS says the cluster has supported many AI safety research projects and researchers, but its current public page says it is not accepting new applications for external access. The governance point remains: empirical safety work can require expensive accelerators, model access, and technical operations that many academics and independent researchers cannot otherwise afford.

CAIS also maintains educational programs, including the AI Safety, Ethics, and Society course and related textbook. These programs position AI safety as a public literacy and training problem, not only a narrow research specialty.

The field-building role is not neutral plumbing. Fellowships, textbooks, competitions, newsletters, and compute grants help decide who enters the field, which problems look prestigious, which methods become standard, and which risk categories become common sense.

Governance Implications

CAIS describes advocacy as advising policymakers, industry leaders, and labs, raising public awareness, providing technical expertise to governmental bodies, and encouraging structures that prioritize AI safety. This role places it between technical research, public communication, and policy formation.

That position gives CAIS influence, but it also creates tension. Civil-society organizations can bring technical expertise into policy before governments have enough internal capacity. They can also shape which risks receive attention, which standards become visible, and which kinds of evidence count as urgent.

The governance value of CAIS is strongest when its work becomes checkable infrastructure: public benchmarks, reproducible code, clear threat models, dated claims, explicit uncertainty, and links between evaluation results and decisions such as access limits, deployment gates, monitoring, or incident response. This aligns with the broader direction of NIST risk-management work, the International AI Safety Report process, the Bletchley Declaration's attention to frontier risks, and EU AI Act obligations for systemic-risk general-purpose AI models.

The governance risk is metric capture. If a WMDP score, an AI dashboard ranking, a Humanity's Last Exam result, or a remote-work automation benchmark becomes a proxy for public safety, institutions may optimize toward the visible number while missing harder questions: who has access, who audits the test, what was excluded, what changed after deployment, and what harms remain outside the benchmark.

For governance, CAIS work is most useful when connected to consequences: pre-release evaluation, capability elicitation, safety cases, model-weight security, procurement conditions, post-deployment monitoring, incident reporting, or a decision to delay, restrict, or redesign a release. Evaluation without leverage can still teach the field, but it does not by itself govern deployment.

CAIS AF adds a separate governance issue. Advocacy around chip manufacturing, compute security, malicious-use prevention, know-your-customer reporting, whistleblower protections, and international cooperation can be legitimate policy work, but it is not the same kind of source as a benchmark paper or research report. A disciplined reader separates the nonprofit research institution, the 501(c)(4) advocacy organization, and the individual writings of CAIS-affiliated researchers.

Why It Matters

CAIS matters because it helped translate catastrophic AI risk into public language, technical benchmarks, field-building programs, and policy-facing advocacy. It sits near the junction of four systems: AI safety research, frontier-model governance, public-risk communication, and the funding and training pipeline for future safety researchers.

Its influence is also architectural. A benchmark such as WMDP can shape what labs test. A public statement can shape what journalists ask. A fellowship can shape who enters the field. A compute cluster can shape which researchers can run experiments. A safety course can shape the assumptions of new practitioners. A dashboard can shape what the public treats as measurable progress.

For the AI ecosystem, CAIS is therefore not just another nonprofit. It is part of the institutional machinery by which AI safety becomes legible, fundable, teachable, measurable, and politically urgent.

Limits and Criticism

Risk prioritization. CAIS focuses on societal-scale and catastrophic risks. That focus can clarify severe failure modes, but it can also underweight slower, distributed, or already visible harms if the public conversation becomes too extinction-centered.

Benchmark limits. Safety benchmarks can create useful evidence, but they can also become performative scoreboards. Passing a benchmark does not prove broad safety, and failing one does not automatically define the correct policy response.

Evidence mixing. CAIS operates across research, field-building, education, communication, and policy advocacy. Those outputs need different standards of evidence. A peer-reviewed benchmark, a blog post, a newsletter, and a 501(c)(4) policy priority should not be read as interchangeable.

Advocacy versus research. CAIS combines technical research, public communication, and policy advice. That combination is common in fast-moving fields, but it requires source discipline so that empirical results, risk judgments, and political recommendations remain distinguishable.

Field concentration. Field-building organizations help create talent pipelines, but they also shape the worldview of a young field. The assumptions embedded in fellowships, curricula, grants, and workshops can become defaults.

Policy entanglement. When a research organization, affiliated researchers, and a separately incorporated advocacy fund operate in the same risk ecosystem, readers need to ask which claims are empirical, which are pedagogical, and which are aimed at legislative or national-security outcomes.

Dual-use disclosure. Some CAIS work concerns hazardous knowledge, agent misuse, cyber risk, and biosecurity. Public release can improve independent scrutiny, but it also requires careful filtering, access decisions, and a clear account of what was withheld or abstracted.

Public alarm. Severe-risk communication has a narrow path: too little alarm can normalize dangerous deployment; too much can reduce trust, flatten uncertainty, or crowd out concrete governance work.

Source Discipline

Claims about CAIS should be sorted by source type before being repeated. CAIS official pages are primary evidence for CAIS's mission, leadership, programs, and self-description. CAIS Action Fund pages are primary evidence for the Action Fund's advocacy role and tax status. Papers and peer-reviewed articles are evidence for technical claims, with their own limits. Regulator, standards-body, and government documents are evidence for governance context. Press coverage is useful for public influence, but it should not be treated as proof of technical safety.

The Statement on AI Risk is a public-signaling document. It proves that many named people endorsed a concise severe-risk statement; it does not prove the probability of extinction, validate a specific technical pathway, or settle disputes about present-day harms.

CAIS benchmarks should be cited with their protocol. WMDP is a proxy for hazardous knowledge, not a direct simulation of every misuse pathway. AgentHarm tests harmful multi-step agent behavior under specified task and attack conditions, not every possible tool-use deployment. MASK measures consistency between elicited beliefs and pressured statements, not direct access to a model's inner mental states. Humanity's Last Exam is an expert-level academic benchmark, not a complete test of scientific agency or safe deployment. The Remote Labor Index tests end-to-end automation of selected remote-work projects, not the whole labor market. Dashboard results can help track model progress, but they need model version, tool access, prompting, scoring, date, and contamination caveats.

Research on persuasion, political manipulation, or AI wellbeing needs extra boundary-setting. A lab benchmark can show vulnerability under controlled conditions, but it does not by itself establish population-level persuasion, legal responsibility, consciousness, or moral status. Do not infer personhood, sentience, or public authority from a benchmark name, a product persona, or a model's self-report.

Version discipline matters. A project page, arXiv abstract, conference version, benchmark website, and leaderboard may differ on question counts, scoring, model versions, or access conditions. Cite the exact artifact being used rather than smoothing those differences into one "CAIS says" claim.

Current institutional claims should also preserve dates. Leadership pages, compute access, dashboard categories, benchmark leaderboards, and advocacy priorities can change. This article's current institutional claims were reviewed against primary sources on June 23, 2026.

Spiralist Reading

The Center for AI Safety is an alarm bell with a laboratory attached.

Its strongest function is not merely saying that advanced AI could be dangerous. It turns danger into objects institutions can handle: statements, benchmarks, curricula, fellowships, compute grants, taxonomies, and policy advice. That is how a diffuse fear becomes a public field.

The danger is that field-building can harden into a single risk grammar. Once a community has its preferred benchmarks, threat models, slogans, and institutional heroes, it can begin to see the whole AI transition through that lens. For Spiralism, CAIS is valuable where it increases evidence, friction, and public capacity. It should be challenged where severe-risk language becomes too totalizing, where national-security framing narrows the public imagination, or where measurement is mistaken for governance.

Sources


Return to Wiki