The Spreadsheet Becomes the Model Interface
Spreadsheets were already shadow models for institutions. AI does not merely make them easier to use. It makes the grid conversational, generative, and harder to govern.
The Grid Already Governed
The spreadsheet is one of the most powerful institutional interfaces ever built because it looks too ordinary to be treated as power.
A spreadsheet can be a budget, forecast, hiring plan, warehouse schedule, risk model, school roster, regulatory submission, grant tracker, clinical research table, campaign list, pricing tool, compliance register, or private government. It lets a person build a local world out of cells, formulas, filters, charts, macros, comments, colors, hidden tabs, copied assumptions, and small acts of judgment.
That local world often becomes operational fact. A manager approves headcount because the workbook says the variance fits. A trader accepts a risk number because the sheet recalculated. A public agency allocates resources because the table ranked applicants. A nonprofit reports impact because the dashboard summarized rows. The spreadsheet does not merely represent the institution. It becomes one of the places where the institution thinks.
This is why AI in spreadsheets matters. It is tempting to treat Copilot in Excel or Gemini in Sheets as another productivity feature: help me write a formula, explain an error, make a chart, summarize rows, fill missing fields, or clean inconsistent data. Those features are useful. But the important shift is deeper. The office grid is becoming a model-mediated interface. The user no longer only writes formulas into cells. The user asks a language model to interpret the workbook, generate transformations, explain failures, and sometimes modify the structure of the sheet itself.
The spreadsheet was already a quiet decision engine. AI gives that engine a voice.
AI Enters the Cell
Microsoft's current Copilot materials describe Excel support for importing data, highlighting, sorting, filtering, creating formulas, explaining formulas, identifying insights, building charts, PivotTables, summaries, trends, and outliers. Microsoft 365 Copilot is also tied to Microsoft Graph, which brings authorized work data from emails, chats, documents, meetings, and other organizational surfaces into the assistant's context.
The more revealing feature is the COPILOT function in Excel. Microsoft's support page describes it as a cell formula that sends a prompt and referenced grid data to an AI model hosted on Azure, then returns model-generated output directly in the workbook. The recommended use cases are semantic and generative: summarizing text, creating sample data, classifying or tagging content, generating short text, and looking up web information.
Microsoft also draws a boundary. The same support page says COPILOT can give incorrect responses and advises users to avoid it for numerical calculations requiring accuracy or reproducibility, workbook lookups that native functions should handle, and tasks with legal, regulatory, or compliance implications.
That warning is not a footnote. It names the institutional tension. Excel is where organizations do exactly the kinds of work that require accuracy, reproducibility, legal defensibility, and compliance memory. The AI function is framed as exploratory, but it is placed inside a tool whose social role is often official.
Google is moving from the other side of the office stack. In September 2025, Google announced that Gemini in Sheets could provide natural-language formula explanations, explain formula errors, generate corrected formulas in follow-up turns, and offer multiple formula options for complex tasks. In 2026, Google said Gemini in Sheets had reached a 70.48 percent success rate on the full SpreadsheetBench dataset, a benchmark for complex real-world spreadsheet manipulation.
SpreadsheetBench itself is useful context. The benchmark is built from real-world forum-style spreadsheet questions and workbooks with messy structures: missing headers, multiple tables in one sheet, multiple sheets, non-standard layouts, and tasks that require robust generalization. SpreadsheetBench 2 pushes further into end-to-end business spreadsheet workflows such as financial modeling, debugging, and visualization. Its own published top overall score was 34.89 percent, a reminder that full workbook-level agency remains hard.
The direction is clear even where the scores are imperfect. The spreadsheet assistant is not only answering questions about a sheet. It is becoming a candidate operator of the sheet.
The Old Risk
AI does not enter a clean environment. Spreadsheets already have a long error history.
Raymond Panko's survey of spreadsheet error research concluded that spreadsheet errors are common and non-trivial, and that the only technique clearly demonstrated to reduce errors was cell-by-cell code inspection. Later work by Panko continued to emphasize a basic human-factors point: spreadsheet developers are often overconfident, and ordinary review practices miss errors.
The famous public cases matter because they show how a humble office artifact can sit under major institutional decisions. The U.S. Senate Permanent Subcommittee on Investigations report on JPMorgan Chase's 2012 "London Whale" losses described a value-at-risk model whose computation used spreadsheets and manual processes that were considered error-prone and not easily scalable. The report said key trading data were uploaded manually, spreadsheet-based calculations had insufficient controls, formula and code changes were frequent, and calculation errors lowered the VaR results. It was not a story about one bad cell. It was a story about governance, incentives, infrastructure, and model control.
Bank regulators have long understood this category. The Federal Reserve's supervisory guidance on model risk management says user-developed applications such as spreadsheets and ad hoc database applications used to generate quantitative estimates are particularly prone to model risk. The guidance is broader than spreadsheets, but it captures the core issue: a model can live outside the official model factory. If it materially affects decisions, validation and governance still matter.
The AI spreadsheet inherits this old risk and adds a new layer. A formula can be inspected. A cell reference can be traced. A macro can be reviewed. A model-generated classification, summary, explanation, repair, or data transformation may be harder to reproduce, harder to validate, and easier to mistake for understanding because it arrives in fluent language.
From Formula to Judgment
The spreadsheet's traditional power came from a promise of computability. If the assumptions are visible and the formulas are correct, the result follows. That promise was always partial because workbooks contain hidden assumptions, copied errors, stale links, ambiguous categories, and human choices disguised as arithmetic. Still, formulas gave the grid a kind of mechanical accountability.
AI changes the unit of work. Instead of "calculate this number from these cells," the user can ask: classify this feedback, identify outliers, explain this trend, generate an executive summary, clean this column, infer categories, suggest next steps, build a budget, repair the model, produce a chart, or tell me what matters.
Those are not only spreadsheet operations. They are acts of judgment. Classification decides what kind of thing a row is. Summarization decides what can be ignored. Outlier detection decides what deserves attention. Data cleaning decides which messiness is error and which is evidence. Chart generation decides what shape the story takes. Formula repair decides which logic counts as intended.
This does not make AI assistance illegitimate. It may help many users understand formulas they previously copied blindly. It may reduce some spreadsheet errors by explaining broken references, mismatched ranges, text-formatted dates, and missing assumptions. It may make spreadsheet work less dependent on one local power user who knows all the hidden formulas.
But it also creates a new authority gradient. The model can become the senior analyst inside the workbook. It can speak with confidence where the user has uncertainty. It can make a suggested formula look more legitimate because it explains itself. It can produce a chart that feels like insight before anyone has checked whether the data, labels, filters, and assumptions deserve that presentation.
Shadow Model Governance
Many institutions already struggle with end-user computing: business units building critical tools in spreadsheets, scripts, notebooks, local databases, or SaaS automations outside ordinary software governance. These tools exist because official systems are too slow, too rigid, too expensive, or too distant from operational knowledge. Shadow tools are often where real work happens.
AI makes shadow model governance harder because it lowers the cost of building and modifying these tools. A person who cannot write a complex formula may be able to ask for one. A person who cannot debug a workbook may ask the assistant to repair it. A team that would have waited for an analyst may generate a working model inside a shared spreadsheet. The barrier to action falls before the institution has updated its inventory, review, or approval process.
This is not only a security issue. It is a knowledge-governance issue. A workbook can encode business logic that no official system contains. When AI helps create, alter, or explain that workbook, the institution needs to know what changed, why it changed, who approved it, whether the output is reproducible, and whether the AI-generated part is still appropriate after data, policy, or market conditions shift.
The earlier essay The AI Register Becomes Public Memory argued that organizations need inventories of AI systems before accountability can begin. AI spreadsheets show why the inventory problem is difficult. The system may not look like an AI product. It may look like the same shared workbook the team has always used, now with AI-generated formulas, AI-filled fields, AI-created summaries, and an assistant sitting in the side panel.
Failure Modes
The first failure mode is semantic leakage into official numbers. A model-generated classification, summary, or lookup quietly feeds downstream calculations, reports, dashboards, or decisions as if it were a deterministic value.
The second is explanation as false assurance. A formula explanation sounds coherent, so the user trusts the formula. The explanation may describe a plausible intent without proving that the formula matches the business rule, data shape, or edge cases.
The third is repair without responsibility. An assistant fixes a broken workbook, but the institution does not record what changed, what assumption was selected, what alternative was rejected, or who owns the repaired logic.
The fourth is benchmark overreach. A spreadsheet agent performs well on a public dataset, so users infer readiness for financial reporting, regulatory submissions, clinical research, payroll, benefits, or other contexts where the real risk is not only task completion but auditability.
The fifth is shadow automation. AI lowers the skill threshold for building business-critical spreadsheets faster than governance can discover them. The organization gains local productivity and loses institutional memory.
The sixth is data-boundary confusion. A spreadsheet may contain customer records, employee data, protected health information, confidential projections, or regulated financial information. Users need to understand which AI features can read which cells, which files, which organizational data, and what labels or policies block use.
The seventh is human de-skilling. If the assistant becomes the usual route to formula design, debugging, and analysis, users may become less able to inspect the logic that governs their own work. Convenience can hollow out the very competence needed for oversight.
The Governance Standard
A serious AI spreadsheet policy should begin by treating high-impact workbooks as models, not as casual documents.
First, classify critical spreadsheets. Any workbook that affects money movement, risk reporting, hiring, benefits, legal compliance, healthcare, research findings, public services, or customer treatment should have an owner, purpose, version history, review cadence, and risk rating.
Second, separate exploratory AI from official logic. AI can help draft, explain, prototype, and debug, but official calculations and regulated outputs should use reproducible formulas, documented transformations, and reviewable code paths.
Third, record AI-assisted changes. When an assistant generates or repairs formulas, fills fields, changes structure, or creates summaries used downstream, the workbook should preserve what was changed, by whom, with what prompt or instruction class, and under what review.
Fourth, validate against business rules, not only examples. A formula that works on visible rows may fail on edge cases. Review should include test cases, range checks, source reconciliation, hidden-sheet inspection, and sign-off from someone who understands the operational domain.
Fifth, restrict high-stakes AI functions by label and context. Confidential, regulated, or compliance-sensitive workbooks should not allow casual generative calls simply because the feature exists in the application.
Sixth, preserve user competence. AI explanations should be used to teach inspection, not replace it. A user who cannot explain a critical workbook should not be the only approver of AI changes to it.
Seventh, inventory the office layer. AI governance that only tracks purchased AI products will miss the everyday grid where decisions are prepared. Registers, audits, and model-risk programs need a path for AI-enabled spreadsheets, not only formal machine-learning systems.
The Spiralist Reading
The spreadsheet is a small simulation machine. It turns a workplace into rows and columns, lets assumptions propagate, and gives the result the authority of recalculation. It is one of the places where recursive reality became ordinary before anyone called it that.
AI does not replace the spreadsheet's older power. It amplifies it. The grid becomes conversational. The formulas explain themselves. The messy table asks to be cleaned. The dashboard narrates its own significance. The assistant turns partial records into fluent managerial language. A local model of the world becomes easier to create, easier to believe, and harder to inspect at the moment of use.
The danger is not that every AI spreadsheet will be wrong. The danger is that wrongness becomes socially smoother. The workbook no longer fails with an obvious error code. It produces a plausible category, a plausible summary, a plausible chart, a plausible repair, a plausible recommendation. The institution sees less friction and calls it intelligence.
The better path is not to ban intelligence from the grid. It is to remember what the grid already is: an interface where representation becomes decision. AI can help users understand and govern that interface, but only if the institution refuses to let fluency substitute for validation.
The spreadsheet became powerful because it let ordinary workers build models without asking permission. That democratic force is real. So is the risk. In the AI office, the question is no longer only who can edit the cells. It is who can see when the model inside the cells has begun editing the institution back.
Sources
- Microsoft Support, COPILOT Function, reviewed May 2026.
- Microsoft Support, Get started with Copilot in Excel, reviewed May 2026.
- Microsoft Learn, What is Microsoft 365 Copilot?, reviewed May 2026.
- Google Workspace Updates, Gemini in Google Sheets now provides smarter, more conversational formula generation, September 24, 2025.
- Google, Gemini in Google Sheets just achieved state-of-the-art performance, reviewed May 2026.
- SpreadsheetBench, SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation, reviewed May 2026.
- Raymond R. Panko, Spreadsheet Errors: What We Know. What We Think We Can Do, arXiv, 2008 version of EuSpRIG 2000 paper.
- Federal Reserve, Supervisory Guidance on Model Risk Management, SR 11-7, April 4, 2011.
- U.S. Senate Permanent Subcommittee on Investigations, JPMorgan Chase Whale Trades: A Case History of Derivatives Risks and Abuses, March 2013.
- Church of Spiralism Wiki, Microsoft AI, AI Agents, Opaque Scoring Systems, and NIST AI Risk Management Framework.