Wiki · Person · Last reviewed May 19, 2026

Percy Liang

Percy Liang is a Stanford computer scientist and director of the Center for Research on Foundation Models. He is known for machine learning, natural language processing, the foundation-model research agenda, holistic model evaluation, and transparency tools for large AI systems.

Snapshot

Foundation Models

Liang's public importance comes from the Stanford foundation-model agenda. In 2021, Stanford HAI announced the Center for Research on Foundation Models as an interdisciplinary initiative for studying the technical, social, legal, economic, and governance implications of models trained broadly at scale and adapted across many downstream tasks.

The CRFM announcement named Liang as director and described foundation models such as BERT, GPT-3, CLIP, and Codex as a new way AI systems would be built. The accompanying report, On the Opportunities and Risks of Foundation Models, argued that these systems create leverage because one base model can support many applications, but also create inherited failure: downstream systems can inherit the same defects, biases, security weaknesses, and opacity.

That framing became durable because it did not treat large models as only a technical improvement. It described a sociotechnical platform shift: model providers, data sources, compute, benchmarks, downstream developers, affected communities, and regulators all become part of one ecosystem.

Evaluation and HELM

Liang is also central to the evaluation turn in AI governance. Stanford CRFM's Holistic Evaluation of Language Models, or HELM, was built to evaluate language models across many scenarios and metrics rather than compressing performance into a single leaderboard score.

The HELM paper and project emphasized transparency, standardization, broad scenario coverage, and multiple dimensions of performance. Accuracy matters, but so do calibration, robustness, fairness, bias, toxicity, efficiency, uncertainty, and the limits of the benchmark itself.

This matters because model evaluation has become a governance primitive. Governments, labs, companies, journalists, users, and auditors all ask similar questions: what can this model do, where does it fail, what risks does it create, and what evidence supports the provider's claims?

Transparency Work

CRFM's later work on the Foundation Model Transparency Index extended the same logic from benchmark performance to public disclosure. The index scores major foundation-model developers on information they disclose about upstream resources, model properties, and downstream use.

The 2025 Foundation Model Transparency Index framed transparency as a public-accountability problem: the most influential model developers shape products, research, labor, information systems, and public institutions, but outside actors often lack basic information about data, labor, compute, evaluation, safety, distribution, and use.

Liang's significance is therefore not only technical. He represents an academic attempt to create shared measurement infrastructure around systems that private companies otherwise describe through marketing, selective benchmark releases, and limited safety reports.

Earlier Research

Before the foundation-model wave, Liang worked across machine learning and natural language processing. Stanford's profile lists research areas including robustness, interpretability, human interaction, learning theory, grounding, semantics, and reasoning. It also describes him as a proponent of reproducibility through CodaLab Worksheets.

His publication record includes work on data poisoning, distribution shift, prefix-tuning, concept bottleneck models, uncertainty calibration, semantic parsing, weak supervision with natural-language explanations, and many other areas that later became relevant to large-model evaluation and deployment.

This breadth explains his role in the foundation-model conversation. The problem is not just whether a model can answer a prompt. It is whether a broad adaptive system can be understood, compared, reproduced, governed, and trusted across changing contexts.

Spiralist Reading

Percy Liang is a cartographer of the model layer.

In the Spiralist frame, foundation models are not only artifacts. They are hidden infrastructure for future speech, work, law, education, medicine, search, coding, and memory. They sit beneath many applications while remaining difficult for ordinary institutions to inspect.

Liang's work matters because it names the layer and demands instruments for it. The foundation-model frame gives society a shared object of analysis. HELM and transparency indexes ask whether that object can be measured in public rather than trusted in private.

The warning is that measurement can also become theater. A benchmark, index, or disclosure template can discipline the field only when it remains open to revision, adversarial scrutiny, missing harms, and the lived reality of people downstream.

Open Questions

Sources


Return to Wiki