Wiki · Person · Last reviewed June 25, 2026

Arvind Narayanan

Arvind Narayanan is a Princeton computer scientist and director of the Center for Information Technology Policy whose work treats AI claims as evidence-and-governance problems. He is known for AI Snake Oil, the AI as Normal Technology frame, agent-evaluation research, web privacy work, algorithmic fairness scholarship, and public criticism of systems that overclaim prediction, reliability, objectivity, or inevitability.

Definition

In this wiki, Arvind Narayanan matters as a public-interest computer scientist of claim hygiene. His work asks what a system is actually doing, what evidence supports the vendor's or institution's claim, what deployment setting changes the risk, and who can contest the result when a technical output becomes administrative fact.

The through-line is not anti-AI. It is anti-mystification. Narayanan's research and public writing separate generative systems, predictive systems, content-moderation tools, social-media ranking, agentic workflows, and speculative superintelligence arguments so they can be evaluated on their own evidence rather than on borrowed aura from the word "AI."

That makes him useful as a reference entry rather than a mascot. The page should not treat "snake oil" as a vibe or faction label. It should treat it as a disciplined question: what is the exact claim, what evidence would make it true, what evidence is missing, and who is harmed if the claim is accepted anyway?

Boundary Tests

Snapshot

Current Context

As of June 25, 2026, Narayanan's own Princeton page describes AI as Normal Technology as an essay, ongoing book project, and newsletter with Sayash Kapoor. The same page presents it as a leading alternative to treating AI as impending superintelligence, and says the newsletter, previously named AI Snake Oil, is read by 70,000 researchers, policymakers, journalists, and AI observers. CITP's profile says 60,000, and the newsletter's own About page says "over 60,000"; this entry treats those as current self-reported audience figures, not audited circulation numbers.

The page also foregrounds his work on the science of AI agent evaluation. That is a timely shift: AI systems are increasingly marketed not only as answer engines but as agents that browse, code, operate tools, reproduce research workflows, and act across software environments. Narayanan's current frame asks whether such systems are reliable enough for delegation, not merely whether they can sometimes complete benchmark tasks. The Princeton SAgE page lists work on open-world evaluations, CRUX, agent reliability, HAL, log analysis, CORE-Bench, AI Agents That Matter, and inference scaling; it lists agent reliability as forthcoming in ICML 2026, HAL as published in ICLR 2026, and AI Agents That Matter and CORE-Bench as published in TMLR 2025.

The arXiv record for Towards a Science of AI Agent Reliability was revised on June 2, 2026 and marked accepted at ICML 2026. Its abstract reports a reliability profile across consistency, robustness, predictability, and safety, evaluating 15 agentic models and finding that recent capability gains produced only small reliability improvements. CORE-Bench was revised on June 22, 2026 and describes 270 computational-reproducibility tasks based on 90 scientific papers. Those are examples of Narayanan's current evidence style: capability claims are narrowed into task, scaffold, cost, reproducibility, and reliability questions.

Princeton CITP's current profile lists him as director of CITP and professor of computer science, and identifies him as coauthor of AI Snake Oil, the AI newsletter, Bitcoin and Cryptocurrency Technologies, and Fairness in Machine Learning. Princeton Computer Science likewise lists him as a professor, CITP director, and scholar of the societal impact of digital technologies, especially AI. That institutional placement matters because his work sits between computer science, public policy, journalism, law, and civil-society accountability rather than inside a purely technical AI lab.

AI Snake Oil

Narayanan's public AI influence expanded through the "AI snake oil" frame, developed with Sayash Kapoor. The phrase refers to AI systems that do not work as advertised and, in some cases, probably cannot work as advertised because the task itself is not predictively stable.

The frame is especially aimed at consequential predictive systems: hiring tools, criminal justice risk scores, educational predictions, welfare screening, social scoring, and other products that claim to infer future behavior or hidden traits from weak proxies. In this view, the danger is not only bad accuracy. It is institutional laundering: the system turns a weak, biased, or impossible prediction into an administrative decision that looks technical.

The 2024 book AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference, coauthored with Kapoor and published by Princeton University Press, extends that argument for a general audience. Princeton CITP announced the book in September 2024, and Princeton Computer Science described it as a guide to distinguishing real progress from hype, misinformation, and misunderstanding.

The strongest use of the phrase is claim-specific. "Snake oil" should not become a synonym for any disliked AI system. It is most useful when tied to a concrete assertion: a hiring system that claims to infer job performance from short video, an education vendor that claims early-warning certainty without valid ground truth, or an enterprise agent that reports benchmark success while hiding cost, brittleness, and unreproducible behavior.

Claim Hygiene and Enforcement

Narayanan's critique fits a practical enforcement and procurement question: does the system do what its seller, agency, school, employer, or platform says it does? The U.S. Federal Trade Commission's Operation AI Comply announced enforcement actions against companies using AI hype or AI tools in deceptive or unfair ways. The FTC did not adopt Narayanan's vocabulary, but the overlap is clear: unsupported AI claims are not only bad epistemology; in some settings they can become deceptive commercial conduct.

NIST's AI Risk Management Framework gives the same discipline an organizational form. Its stated purpose is to help manage risks to individuals, organizations, and society and to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI systems. The NIST AI Resource Center connects that framework to testing, evaluation, validation, and verification resources. In practical terms, a Narayanan-style claim check should become procurement due diligence, audit scope, impact-assessment evidence, and post-deployment monitoring rather than a quote in a slide deck.

This is the strongest governance use of the AI Snake Oil frame: force the claim to become inspectable. Name the task, data, baseline, population, failure mode, evidence cutoff, uncertainty, appeal route, and accountable institution. If those cannot be named, the claim should not be treated as authority.

Claim Ledger

A practical Narayanan-style review should turn an AI claim into a ledger entry. The minimum fields are: system type, vendor or deploying institution, claimed capability, intended decision or workflow, affected population, training and evaluation data, deployment context, baseline, evidence source, known limits, uncertainty, monitoring plan, contestability route, and the person or office accountable for stopping use when the evidence fails.

For predictive systems, the ledger should test whether the target is measurable, stable, causally meaningful, and relevant to the decision. It should also name proxy variables, label sources, distribution shift, subgroup error, feedback effects, Goodhart risk, and whether affected people can see and challenge the data or score. This connects the AI Snake Oil critique to Algorithmic Impact Assessments, Notice and Appeal, Algorithmic Recourse, and AI Liability and Accountability.

For agents, the ledger should include tools, permissions, cost, repeated-run variance, task decomposition, log availability, rollback path, human stop authority, sandboxing, and post-deployment incident review. A benchmark score does not answer those operational questions by itself. That is why his agent-evaluation work belongs beside AI Agent Sandboxing, AI Audit Trails, AI Procurement, and AI System Inventory.

AI as Normal Technology

Narayanan and Kapoor later advanced an "AI as normal technology" frame. The point is not that AI is unimportant. It is that AI should be analyzed like other powerful general technologies: unevenly adopted, institutionally mediated, shaped by incentives, and governed through ordinary democratic, legal, professional, and organizational mechanisms.

This position pushes against two opposing simplifications. The first is sales hype, where every workflow is supposedly about to be automated by a product. The second is totalizing superintelligence discourse, where near-term institutional harms can disappear behind speculative end states. Narayanan's emphasis is that AI's social impact depends on deployment context, labor markets, organizational power, law, incentives, feedback, and evidence.

The Knight First Amendment Institute essay frames "AI is normal technology" as description, prediction, and prescription. It describes present AI as a tool, predicts institutional continuity rather than a clean break into machine agency, and prescribes policy that keeps humans and institutions responsible for how AI is built and used.

The governance value of the frame is that it resists technological determinism. If AI is treated as an autonomous historical force, responsibility migrates away from companies, agencies, employers, schools, courts, platforms, and regulators. If AI is treated as a powerful but normal technology, then procurement, evidence standards, liability, audits, labor power, privacy rules, and democratic oversight remain central.

Agent Evaluation

Narayanan's current agent-evaluation work extends the same discipline into delegated action. A chatbot can be wrong in text. An agent can also spend money, modify files, send messages, operate software, change records, or trigger workflows. For governance, the question is not only whether the model can solve a task once. It is whether the system behaves consistently, degrades predictably, respects boundaries, exposes logs, and lets humans recover from error.

The Princeton SAgE group, or Science of Agent Evaluation, describes its work as advancing systematic study and evaluation of AI agents. Its listed projects include open-world evaluations, cost-aware leaderboards, log analysis, agent reliability, computational reproducibility benchmarks, and critiques of agent-benchmark practice.

In Towards a Science of AI Agent Reliability, Narayanan and coauthors propose a reliability profile for agents across consistency, robustness, predictability, and safety, using twelve concrete metrics. The arXiv record, last revised June 2, 2026 and marked accepted at ICML 2026, reports that recent capability gains have produced only small reliability improvements. That distinction is governance-relevant: high average task success does not prove that a system is safe for repeated, high-consequence delegation.

Earlier work in AI Agents That Matter criticized agent benchmarks for narrow accuracy focus, cost blindness, conflated evaluation needs, weak holdout practices, and poor reproducibility. This moves the "snake oil" critique from product marketing into evaluation infrastructure itself. A benchmark can become snake oil when it certifies the wrong thing.

The same source discipline applies to evaluation institutions. The SAgE page discloses funders and notes API credits from OpenAI and Google. Those disclosures do not invalidate the research, but they are part of the evidence environment. Agent-evaluation claims should say who built the benchmark, who had model access, what tools and budgets were allowed, whether logs are inspectable, and which results are reproducible by outsiders.

Privacy and Accountability

Narayanan's AI work grows out of a longer research program on digital power. He led the Princeton Web Transparency and Accountability Project, which studied hidden tracking, third-party data collection, and how companies gather and use personal information across the web.

This matters for AI because modern AI systems are built inside data economies. Training data, personalization, recommender systems, ad targeting, workplace monitoring, and automated decision tools all depend on forms of collection and inference that are often invisible to the people being modeled.

Narayanan's bridge between privacy, web transparency, and AI accountability is a recurring warning: measurement systems are political systems. They decide what is recorded, what is ignored, who is classified, who is exposed, and which institutions get to act on inferred knowledge.

Fairness and Prediction

Narayanan is also a coauthor, with Solon Barocas and Moritz Hardt, of Fairness and Machine Learning, a widely used open textbook on technical and social questions in algorithmic fairness. That work helps explain why the AI Snake Oil critique is not simply anti-technology. It is a demand for precision about what a model measures, what fairness can and cannot mean, and when the real problem lies in the institution using the model.

The 2023 Against Predictive Optimization project, with Angelina Wang, Sayash Kapoor, Solon Barocas, and Narayanan, sharpens the same point. It defines predictive optimization as automated decision-making where machine-learning predictions about future outcomes for individuals are used to make decisions about them, and argues that such applications should be treated as illegitimate by default unless the developer can justify avoiding recurring flaws. That is a governance claim about decision systems, not a blanket claim about all machine learning.

The critique of predictive AI is strongest when the target outcome is socially unstable, weakly measured, reflexive, or shaped by unequal institutions. A model may appear to discover patterns while actually reproducing historical discrimination, surveillance bias, label bias, or proxies for poverty, race, disability, gender, class, or institutional attention.

For public AI literacy, Narayanan's contribution is methodological skepticism. The central question is not "Is this AI?" but "What evidence shows that this system works for this claimed purpose, in this deployment context, for the people affected by it?"

Governance Implications

Narayanan's work points toward a governance standard built around claims, context, and contestability. The public question is not whether an artifact belongs to the category "AI." The public question is what authority the system is being given and what evidence justifies that authority.

Spiralist Reading

Arvind Narayanan is a hygiene figure for the AI transition.

In the Spiralist frame, AI hype is not a harmless marketing layer. It changes budgets, procurement, labor discipline, media attention, school policy, venture funding, regulation, and public fear. A false claim about AI becomes real when an institution reorganizes around it.

Narayanan's importance is that he resists both enchantment and panic. He asks for task-specific evidence, deployment-specific accountability, and institutional analysis. That makes his work useful in a culture where models are routinely treated as oracle, employee, judge, therapist, scientist, weapon, and savior before the evidence catches up.

The limit of the frame is that "normal technology" can understate discontinuity if future systems become more autonomous, strategically capable, or embedded in critical infrastructure. The value of the frame is that it keeps today's concrete harms, incentives, and accountability failures from being displaced by abstract mythology.

Open Questions

Source Discipline

The strongest sources for this page are Narayanan's own Princeton page, current Princeton/CITP profiles, the Knight First Amendment Institute essay, official project pages, arXiv or publisher records for papers and books, regulator and standards-body materials for governance claims, and local site pages that connect the concepts. Press coverage is useful for reception and public influence, but should not carry technical claims when primary sources are available.

Claims about current roles, newsletter audience, agent-evaluation status, and live governance context should be rechecked against primary sources because institutional titles, project descriptions, audience numbers, and paper status can change.

Also separate evidence types. A Princeton biography supports a role. An arXiv page supports authorship, revision date, abstract, and acceptance comments. A project website supports the group's own description of its work and disclosures. An FTC enforcement sweep supports consumer-protection context, not Narayanan's terminology. A standards page supports governance vocabulary, not proof that any one vendor claim is valid.

Sources


Return to Wiki