Rebooting AI and the Problem of Common Sense
Gary Marcus and Ernest Davis's Rebooting AI is useful because it refuses two lazy stories at once. It does not say artificial intelligence is fake. It says that impressive pattern recognition is not yet the same thing as robust understanding, and that the gap matters most when institutions hand systems real authority.
Common sense, in this review, means the background competence that lets a system connect words to causes, bodies, time, physical constraints, social expectations, exceptions, and consequences. It is not a mystical property. It is the engineering and governance problem of deciding when fluent output is grounded enough to deserve action rights.
The test is concrete: where does a system's model of the world stop, which permissions begin at that boundary, and what evidence would justify letting the output become a record, tool call, purchase, denial, diagnosis, or policy step?
The Book
Rebooting AI: Building Artificial Intelligence We Can Trust appeared in 2019 and was later issued as a Vintage paperback. Penguin Random House lists Gary Marcus and Ernest Davis as the authors, gives the Vintage paperback as published on August 25, 2020 at 288 pages, and describes the book as an argument for robust AI beyond narrow, closed systems.
Marcus is a cognitive scientist and AI critic whose work has long pushed against pure data-scaling stories. Davis is a computer scientist at NYU known for work on common-sense reasoning. Their collaboration matters because the book is not only a policy warning. It is a cognitive argument about what kind of machinery intelligence requires.
Current Context
As of June 24, 2026, the book has to be read between two facts. The first is that generative, multimodal, reasoning, and tool-using systems have made many practical capabilities more ordinary than the 2019 public debate expected. The second is that those gains have not eliminated the gap between a useful output and a system that can safely carry authority across changing context, missing evidence, adversarial pressure, and institutional stakes.
The public lesson of ChatGPT's November 2022 release was not simply fluency. OpenAI's own launch note named familiar limits: plausible but wrong answers, sensitivity to phrasing, guessing instead of asking clarifying questions, and unsafe or biased behavior. Those limitations are not trivia. They are the ordinary cracks through which a language interface can become an administrative error, a misleading explanation, a brittle workflow, or an overconfident agent action.
Current governance language now treats that problem as assurance. NIST's AI Risk Management Framework and Generative AI Profile provide lifecycle risk-management vocabulary; NIST's 2025 TEVV outline treats testing, evaluation, verification, and validation as scoped, documented, time-bound practices; ISO/IEC 42001, 42005, and 42006 turn management systems, impact assessments, and certification-body competence into standards questions; and NIST's 2026 AI Agent Standards Initiative frames autonomous action, identity, authorization, interoperability, and agent security as infrastructure problems. OWASP's LLM and agentic-application guidance adds the security vocabulary for prompt injection, excessive agency, tool misuse, identity abuse, and overreliance.
The European context also needs careful dating. The Commission's AI Act page says the Act entered into force on August 1, 2024 and is generally applicable from August 2, 2026, with exceptions: prohibited practices and AI-literacy duties began applying on February 2, 2025, GPAI rules on August 2, 2025, and Article 50 transparency obligations on August 2, 2026. After a May 7, 2026 political agreement on the AI omnibus, the same page says high-risk rules for certain areas including education, employment, critical infrastructure, migration, and biometrics will apply from December 2, 2027, while product-embedded high-risk systems receive an extended period to August 2, 2028. That timeline matters because organizations are already deploying systems before every compliance layer is in force.
The Gap
The book's central move is to separate task performance from situated understanding. A system can win a game, classify images, translate routine phrases, or produce plausible prose while still failing when the world changes, the task becomes open-ended, or the situation requires background knowledge that was never explicitly stated.
This distinction is easy to blur because AI systems are usually encountered through polished interfaces. The user sees an answer, a route, a recommendation, a generated paragraph, or a confidence score. The interface compresses uncertainty into a finished object. Rebooting AI asks the reader to look behind that surface and ask what the system actually tracks, what it merely correlates, and what breaks when conditions shift.
That remains a live problem after the public release of ChatGPT in November 2022 and the rapid normalization of large language models as workplace, search, coding, and writing interfaces. Scale has made systems more fluent, more useful, and more surprising than the 2019 public conversation expected. Reasoning models add another layer by spending more inference-time computation before answering. But longer deliberation and better benchmark performance still do not abolish the difference between producing a plausible continuation and maintaining a reliable model of cause, context, exception, embodiment, and consequence.
The sharper definition of the gap is not "models cannot do useful things." They plainly can. It is that capability under one distribution does not prove competence under institutional responsibility. A benchmark, demo, or prompt exchange can show skill without showing that the system can notice when the task has changed, when the request is malicious, when the evidence is weak, when a missing fact should block action, or when the human consequences exceed its design envelope. That boundary is the practical meaning of common sense: not a soul in the machine, but a documented limit on what the system can responsibly infer and do.
That is the transfer problem at the heart of the review: moving from answer quality to delegated authority. A model that drafts a memo is one kind of system. A model wired to retrieve private records, rank people, call tools, file forms, spend money, or change a database is another. Common sense becomes a safety property when language is connected to action.
Common Sense as Infrastructure
Marcus and Davis treat common sense as a technical problem, not a folksy virtue. Common sense includes ordinary physical reasoning, social expectations, causal structure, time, object permanence, goals, affordances, and the implicit background facts that humans use without noticing.
That kind of knowledge is not decorative. It is infrastructure for action. A medical system needs to know when a symptom is urgent even if the wording is unusual. A household robot needs to know that a glass near an edge can fall. A legal or welfare assistant needs to know that people describe the same life event in partial, frightened, contradictory ways. A content or safety system needs to know that context can reverse meaning.
Operationally, a system has enough common sense for a deployment only if it can preserve the constraints that matter in that setting. It must know what kind of evidence is missing, which assumptions are fragile, which actions are irreversible, and when escalation is safer than completion. This is not a demand that machines become human. It is a demand that claims about competence be matched to the setting where the output will be used.
The useful governance definition is therefore local rather than metaphysical. A system has common-sense competence for a role only when the deployment can show that the system tracks the constraints that would make action unsafe in that role: missing source material, conflicting records, disability accommodation, physical risk, user coercion, fraud, privacy boundaries, legal deadlines, reversible versus irreversible actions, and the difference between advice and decision.
That definition turns common sense into a recordkeeping problem. A serious deployment should maintain a competence-boundary record: the intended task, authoritative data sources, known non-capabilities, test conditions, excluded populations or contexts, tool permissions, irreversible actions, monitoring signals, escalation triggers, and repair path. Without that record, common sense becomes a vibe inferred from fluent language.
The authors' broader point is that intelligence cannot be trusted merely because it works on benchmarks. Benchmarks are legible environments. Human life is not. A model can become excellent at the test-shaped version of the world while remaining brittle in the world people actually inhabit. That is why benchmark pressure becomes a governance issue, not just a research issue: the test can become the curriculum, and the curriculum can become the institution's definition of reality.
Trust Is a Design Burden
The subtitle, "Building Artificial Intelligence We Can Trust," is doing real work. Trust is not a feeling the interface should induce. It is a burden the system must earn through robustness, transparency, correction, causal competence, and bounded deployment.
This is where the book connects to institutional life. Many organizations want AI to absorb uncertainty: summarize the file, rank the applicant, flag the risk, draft the response, drive the vehicle, advise the patient, monitor the worker. The system becomes attractive precisely because it turns messy judgment into an output that can be routed through a workflow.
But if the system lacks common sense, the institution does not escape judgment. It only relocates judgment into training data, vendors, prompts, dashboards, review queues, and liability language. The person affected by the decision may then face a smooth administrative surface with no obvious place to contest the mistake.
Trust therefore has to be built as a chain of evidence and recourse. What was the system asked to do? What data and tools could it access? What uncertainty was shown to the user? What logs exist? Who could stop the action? Who can correct the record later? Without those answers, "trustworthy AI" becomes interface design, not accountability.
The right unit of trust is the whole workflow, not the model in isolation. A model answer that is harmless in a draft box can become harmful once it is copied into a case file, used as a reason code, trusted by a tired reviewer, or handed credentials to act. Trust has to attach to the path from prompt to record to decision to remedy.
The AI-Age Reading
The strongest AI-age reading of Rebooting AI is not "deep learning failed." That would be too simple. Deep learning has transformed language, vision, code, search, translation, recommender systems, and scientific tooling. The sharper reading is that capability gains do not automatically answer the reliability questions that matter when systems act on people.
Large models often behave like extraordinary cultural compressors. They can draw on patterns from vast bodies of human text, code, and media. That makes them powerful at the interface layer: explanation, imitation, drafting, routing, abstraction, and style transfer. It also makes them dangerous when interface success is mistaken for grounded competence.
The book's common-sense argument therefore belongs beside current work on world models, tool-using agents, evaluations, and AI assurance. A future agent may need language fluency, learned representations, explicit knowledge, causal models, memory, planning, and external tools. The question is less which faction wins and more whether the assembled system can remain inspectable, interruptible, and corrigible once it is embedded in real institutions.
This also changes how to read the older symbolic-versus-neural fight. Rebooting AI is not most useful as a team jersey for one architecture. It is useful as a refusal to let any architecture skip the same deployment questions: What does the system know, how do we know it knows that, where does that knowledge fail, and what happens to people when it fails?
The AI-age reading is therefore architectural pluralism with accountability. Retrieval, search, chain-of-thought-like scaffolding, tool use, simulation, explicit constraints, and human review can each help in some settings. None is a waiver from proving that the assembled system handles the actual context, population, permissions, and failure modes of the deployment.
Governance and Safety
The governance implication is that common sense cannot be inferred from demonstration fluency. A system should receive authority only for tasks where its likely failure modes are known, monitored, bounded, and reversible. The more a system can affect rights, money, bodies, records, mobility, or public trust, the more the burden shifts from "it usually answers well" to "we have a safety case for this setting."
By June 24, 2026, the relevant standards vocabulary treats that burden as lifecycle risk management and assurance. The point is not that standards prove common sense. It is that they force deployers to say what they tested, what they did not test, when the evidence was gathered, which configuration was evaluated, who can inspect the result, and what changes when the system fails. That is the practical bridge from Marcus and Davis's cognitive critique to AI evaluations, model and system cards, third-party assurance, and AI system inventories.
For common-sense failures, the controls are concrete: narrow deployment claims, task-specific evaluations, out-of-distribution tests, source-grounding checks, red-team exercises, least-privilege permissions for agents, incident logs, escalation rules, user-visible uncertainty, appeal paths, and human override with real authority. A chatbot that drafts text needs one control set. An agent that can file forms, call APIs, spend money, send messages, or alter records needs a stricter one because its errors do not remain inside the interface. Security testing must also cover prompt injection, poisoned retrieval context, tool-output manipulation, identity confusion, and privilege escalation.
The safety case should include the workflow around the model, not only the model. Who set the system prompt? Which tools are available? Which records can be read or changed? Which reasoning budget, retrieval corpus, or memory state was used? What happens when a tool call fails? Can the action be rolled back? Can an affected person see that AI was used and get the record corrected? These questions connect the common-sense gap to agent observability, tool permission, incident reporting, and ordinary administrative due process.
A useful safety case should answer ordinary questions before procurement or launch: What can the system change? What can it only suggest? Which data sources are authoritative? Which outputs require verification? What counts as an incident? Who receives notice? How are tool permissions revoked? How can an affected person get the record corrected? Can auditors reconstruct the path from input to output to action? If those questions sound administrative, that is the point. Common-sense gaps become harms through administration.
Where the Book Needs Friction
Rebooting AI was written before the public explosion of frontier language models. Some readers will find its emphasis on deep-learning limits too confident in places where scale later changed the practical frontier. A fair review has to say that: many tasks that looked remote in 2019 became ordinary by the middle of the 2020s.
But that does not make the book obsolete. It shifts the burden of interpretation. The book is weakest if treated as a forecast of which architecture would dominate the next product cycle. It is strongest as a checklist for trust: Can the system handle novelty? Can it explain its grounds? Does it know cause from correlation? Can it use background knowledge? Does it fail gracefully? Can affected people appeal?
The strongest correction to the book is that capability can arrive through messy hybrids rather than through the clean research program any critic prefers: scale, retrieval, synthetic data, tool use, human feedback, multimodal training, formal methods, explicit memory, search, and domain-specific scaffolding may all matter. The strongest defense of the book is that none of those ingredients cancels the need to prove fitness in context. Architecture is not accountability.
Independent reception also read the book in that spirit. A 2021 SIAM News review described it as both a scientific assessment and a "food for thought" text, especially useful for clarifying the difference between AI broadly and machine learning narrowly.
What This Changes
Rebooting AI is a book about the danger of mistaking a responsive surface for a responsible mind.
That danger recurs across AI companions, workplace copilots, automated welfare systems, search answers, robotics, safety filters, and agentic tools. A system can sound helpful while lacking the world model needed to understand the downstream effects of help. It can produce confidence while hiding brittleness. It can make an institution feel more rational while narrowing the channels through which reality can object.
The recurring pattern is concrete: an interface turns uncertainty into an answer, an answer into a record, a record into a workflow, and a workflow into authority. Once that happens, the question is no longer whether the model "understands" in a philosophical sense. The question is whether the institution has built enough evidence, limits, and recourse around the system to keep its mistakes from becoming policy.
The site-wide lesson is not a slogan about machine ignorance. It is a deployment rule: when a system cannot reliably carry context, its authority must be narrowed until context is supplied by accountable people, records, tools, and review channels. That is why common sense belongs with benchmark contamination, chain-of-thought monitorability, human oversight, and claim hygiene. The question is how much reality the institution lets back in after the model has answered.
The practical lesson is not to reject AI systems until they become human-like. It is to match authority to demonstrated competence. Use models where their failure modes are tolerable, observable, and correctable. Keep humans responsible where context, care, rights, bodies, money, freedom, and public legitimacy are at stake. Require source trails, appeal paths, narrow permissions, independent audits, and deployment boundaries.
The book's lasting value is its insistence that intelligence is more than output. In an age of fluent machines, that distinction is no longer academic. It is the line between assistance and automated authority.
Source Discipline
This review separates book facts, author context, research claims, reception, and governance sources. Penguin Random House and SIAM are used for edition, publication, and reception facts. The NYU author page is used for Marcus and Davis's collaboration and common-sense reasoning background. Marcus's 2018 paper is used as a critique of deep-learning limits, not as a settled forecast. Davis's 2023 paper is used for the continuing difficulty of word problems that require common-sense reasoning. OpenAI's launch note is used only for the limitations it named at release. NIST, ISO, European Commission, and OWASP sources define current risk-management, assurance, impact-assessment, agent-standards, transparency, and security vocabulary; they are not evidence that any particular AI system is safe.
The interpretive claim is narrower than the hype cycle around AI: systems can be highly useful while remaining brittle when language, context, causality, and institutional authority are confused. This article does not claim that an AI system is conscious, divine, or artificial general intelligence.
Related Pages
- What Computers Still Can't Do and Background Intelligence
- AI Snake Oil and the Prediction Machine
- The Alignment Problem and Human Values
- Human Compatible and the Control Problem
- Artificial Intelligence and the Discipline of Not Knowing
- Yann LeCun and the World Model Bet
- When the Benchmark Becomes the Curriculum
- The AI Audit Becomes the Compliance Interface
- Agent Tool Permission Protocol
- Common-Sense AI, AI Evaluations, Reasoning Models, AI Governance, AI Agents, Human Oversight, AI System Inventory, AI Audit Trails, Prompt Injection, AI Agent Observability, and Reward Hacking
Sources
- Penguin Random House, Rebooting AI by Gary Marcus and Ernest Davis, publisher listing for paperback and ebook editions, reviewed June 24, 2026.
- Oana Marin, SIAM News, "How Intelligent is Artificial Intelligence?", January 25, 2021, reviewed June 24, 2026.
- Gary Marcus, "Deep Learning: A Critical Appraisal", arXiv, 2018, reviewed June 24, 2026.
- Ernest Davis, "Mathematics, word problems, common sense, and artificial intelligence", arXiv, 2023, reviewed June 24, 2026.
- Ernest Davis, NYU, author page for Rebooting AI, reviewed June 24, 2026.
- OpenAI, "Introducing ChatGPT", official announcement dated November 30, 2022, reviewed June 24, 2026.
- National Institute of Standards and Technology, AI Risk Management Framework Core, govern, map, measure, and manage functions, reviewed June 24, 2026.
- National Institute of Standards and Technology, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1, July 26, 2024, reviewed June 24, 2026.
- National Institute of Standards and Technology, Outline: Proposed Zero Draft for a Standard on AI Testing, Evaluation, Verification, and Validation, 2025, reviewed June 24, 2026.
- National Institute of Standards and Technology, AI Agent Standards Initiative and February 17, 2026 announcement, reviewed June 24, 2026.
- NIST National Cybersecurity Center of Excellence, Software and AI Agent Identity and Authorization, reviewed June 24, 2026.
- OWASP, Top 10 for Large Language Model Applications and Top 10 for Agentic Applications 2026, security-risk vocabulary for LLM and agentic deployments, reviewed June 24, 2026.
- European Commission, AI Act, official implementation page for risk levels, transparency rules, GPAI rules, AI omnibus update, and application timeline, reviewed June 24, 2026.
- European Commission AI Act Service Desk, Timeline for the Implementation of the EU AI Act, reviewed June 24, 2026.
- European Commission, Code of Practice on Transparency of AI-Generated Content, published June 10, 2026 and applicable to Article 50 transparency obligations from August 2, 2026, reviewed June 24, 2026.
- International Organization for Standardization, ISO/IEC 42001:2023 Artificial intelligence management system, published 2023, reviewed June 24, 2026.
- International Organization for Standardization, ISO/IEC 42005:2025 AI system impact assessment, published 2025, reviewed June 24, 2026.
- International Organization for Standardization, ISO/IEC 42006:2025 requirements for bodies auditing and certifying AI management systems, published 2025, reviewed June 24, 2026.
Book links are paid affiliate links. As an Amazon Associate I earn from qualifying purchases.
- Amazon, Rebooting AI by Gary Marcus and Ernest Davis, reviewed June 24, 2026.