Wiki · Person · Last reviewed May 19, 2026

Dan Hendrycks

Dan Hendrycks is an AI safety researcher and executive director of the Center for AI Safety, known for the MMLU benchmark, the GELU activation function, technical work on ML safety, and public advocacy around catastrophic and societal-scale AI risks.

Snapshot

Benchmarks and Model Measurement

Hendrycks is one of the researchers behind MMLU, or Massive Multitask Language Understanding. The 2020 paper introduced a broad test across 57 academic and professional tasks, including mathematics, law, history, computer science, and other domains. It argued that high performance required both world knowledge and problem-solving ability, not merely fluency.

MMLU became one of the standard scoreboards for large language models. That made Hendrycks influential not only as a researcher but as a shaper of public measurement culture. Model releases, investor narratives, and policy discussions often compress system capability into benchmark scores. MMLU helped define that grammar.

Hendrycks' earlier GELU paper, co-authored with Kevin Gimpel, also had broad technical influence. GELU became a common activation function in modern neural-network architectures, including transformer-based systems. In that sense, Hendrycks' work appears both inside model architectures and outside them, in the benchmarks used to judge model capability.

ML Safety Research

The 2021 paper Unsolved Problems in ML Safety, by Hendrycks, Nicholas Carlini, John Schulman, and Jacob Steinhardt, framed ML safety as a technical research agenda organized around robustness, monitoring, alignment, and systemic safety. The paper's importance is that it translated broad AI-risk concern into concrete research categories.

That roadmap remains useful because it refuses to treat safety as a single switch. Robustness asks whether systems can withstand hazards. Monitoring asks whether hazards can be detected. Alignment asks whether systems are pursuing intended objectives. Systemic safety asks how institutions, incentives, and deployment environments create risk even when a component appears technically competent.

Hendrycks' safety work therefore sits between benchmark engineering and risk taxonomy. It asks how measurement can reveal failure modes before models are integrated into high-stakes systems.

Center for AI Safety

Hendrycks is executive director of the Center for AI Safety, a nonprofit focused on reducing societal-scale risks from artificial intelligence. CAIS describes its work as research, field-building, advocacy, and infrastructure for AI safety.

CAIS became widely visible in May 2023 through its short Statement on AI Risk, which said that mitigating extinction risk from AI should be treated as a global priority alongside other societal-scale risks such as pandemics and nuclear war. The statement drew signatures from leading AI scientists and executives, including Geoffrey Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, Dario Amodei, and others.

That statement did not settle the empirical probability of catastrophe. Its function was social and political: it created common knowledge that many high-status AI figures were willing to publicly name severe risk. Hendrycks' role at CAIS places him near the center of that public reframing.

Public Risk Advocacy

Hendrycks' 2023 paper An Overview of Catastrophic AI Risks, co-authored with Mantas Mazeika and Thomas Woodside, organized catastrophic AI risk into four broad sources: malicious use, AI race dynamics, organizational risks, and rogue AIs. This taxonomy became a compact way to discuss how advanced AI could create severe harm without reducing the problem to a single failure mode.

The taxonomy is also politically charged. It expands the field of safety from model internals to institutional and geopolitical dynamics. Competitive pressure can induce unsafe deployment. Organizations can normalize deviance or mishandle complex systems. Malicious actors can exploit powerful models. More capable autonomous systems could become difficult to control.

Hendrycks' public influence comes from joining these layers together: technical failure modes, benchmark evidence, organizational incentives, policy narratives, and public moral urgency.

AI Safety Education

Hendrycks developed the course and textbook Introduction to AI Safety, Ethics, and Society, published by Taylor & Francis in 2024 and available online through CAIS. The course covers modern AI fundamentals, technical safety problems, catastrophic risk categories, safety engineering, complex systems, collective action, governance, and ethics.

This educational work matters because AI safety is no longer only a research niche. It is becoming a public literacy problem for engineers, policymakers, students, journalists, and institutional leaders who must understand model capabilities, evaluation limits, and societal risk without collapsing into either hype or denial.

For the wiki, Hendrycks belongs near MMLU, AI evaluations, AI alignment, frontier safety frameworks, AI safety institutes, catastrophic risk, and AI governance. His career shows how a technical benchmark can become a public institution of measurement, and how a safety argument can become a field-building project.

Spiralist Reading

Dan Hendrycks is a maker of scoreboards and alarms.

One side of his work asks what the model can do. MMLU turns capability into a measurable public object. GELU becomes part of the machinery by which capability is built. The other side asks what society does when the scores keep rising and the machinery becomes general.

For Spiralism, Hendrycks matters because he shows how measurement becomes culture. A benchmark is never only a benchmark. It becomes a target, a press-release unit, a funding signal, a policy fact, and eventually a curriculum for what the field thinks intelligence is.

The danger is that the public confuses a number with understanding. The value is that measurement can also force hidden failure into view. The responsible path is neither worship of benchmarks nor rejection of them. It is disciplined measurement joined to institutional memory, adversarial review, and humility about what the score does not see.

Open Questions

Sources


Return to Wiki