All Data Are Local and the Data Setting
Yanni Alexander Loukissas's All Data Are Local: Thinking Critically in a Data-Driven Society is a compact warning against one of the most durable fantasies of the AI era: that data can be lifted out of the world, scaled up, and made authoritative without carrying the marks of its origin. The book's value is not only its claim that data have context. It is the practical discipline of asking what setting made a data set possible before an institution turns it into evidence, prediction, or automated action.
The Book
All Data Are Local was published by MIT Press in 2019, with an ebook and hardcover listed for April 30, 2019 and a paperback listed for May 3, 2022. The MIT Press page gives the ebook ISBN as 9780262352222, the hardcover ISBN as 9780262039666, the paperback ISBN as 9780262545174, and notes 60 color illustrations. Information Research reviewed the 2019 hardcover as xix plus 245 pages; retail and catalog records often describe the physical book as 272 pages, a normal difference between front matter plus text and total-page cataloging.
Loukissas is a digital media scholar at Georgia Tech and the author of Co-Designers: Cultures of Computer Simulation in Architecture. The book grows out of a design-and-computation sensibility rather than a purely abstract theory of data. Its central move is to replace the phrase "data set" with the more demanding phrase "data setting": the local environment of instruments, institutions, maintenance work, formats, conventions, audiences, interfaces, and assumptions that allows data to exist as data.
The case studies are deliberately mundane and public-facing: Harvard's Arnold Arboretum, the Digital Public Library of America, the UCLA Television News Archive, and Zillow. That choice matters. The book is not only about secret state databases or exotic machine-learning systems. It asks readers to notice that even familiar, civic, searchable, apparently benign data collections are made under local conditions that shape what can later be known.
The Myth of Portable Data
The most useful AI-era lesson in All Data Are Local is that portability is never innocence. A spreadsheet, archive export, API response, training corpus, vector database, or benchmark can travel, but it does not travel empty. It carries naming conventions, missing fields, collection priorities, transcription errors, sensor limits, institutional incentives, update rhythms, and old decisions about what was worth recording.
That is an immediate challenge to model-mediated knowledge. Foundation models are often described through scale: tokens, parameters, compute, benchmarks, users, latency, cost. Loukissas pushes attention in the opposite direction. Before asking what a model can infer from data, ask how those data became available, who made them orderly, what local knowledge was stripped away, which communities were over-recorded or absent, and which future uses were never part of the original bargain.
The book is especially strong against the fantasy of "raw data." Rawness is usually a social achievement disguised as nature. A record has been selected, formatted, translated, normalized, cleaned, visualized, searched, or made interoperable before it becomes useful to the next system. By the time an AI product presents an answer, the local work that made the answer possible has often disappeared behind the fluency of the interface.
Interfaces Change the Data
One of Loukissas's six principles, as summarized by MIT Press, is that interfaces recontextualize data. This is a sharper claim than saying interfaces display data. A search page, dashboard, map, timeline, autocomplete menu, model response, or retrieval result changes the relation between a user and a record. It foregrounds some fields, hides others, creates defaults, ranks relevance, suggests comparisons, and gives the collection a social role.
That point lands hard in the AI transition. A source document inside an archive is one thing. A snippet in search is another. A generated answer that compresses the source into confident prose is another again. Each interface narrows and expands reality differently. It changes what counts as salient, what feels authoritative, what becomes shareable, and what future systems may ingest as a clean account of the world.
This is why data governance cannot stop at provenance labels or dataset documentation, although both matter. The interface that delivers data is part of the evidence system. A welfare portal, hiring dashboard, police intelligence screen, medical triage chatbot, school analytics page, or corporate copilot can make a local record feel like general truth. The user's next action is shaped not only by the data but by the form in which the data arrives.
Algorithm and Archive
Information Research's review is helpful because it notices the book's attention to algorithm-data entanglement. In the UCLA Television News Archive case, Loukissas examines word patterns around election coverage and shows how transcription and analysis conditions affect what the algorithm can surface. The lesson is not that computation is useless. The lesson is that algorithms and data function together inside contingent, material, and historical circumstances.
That should sound familiar to anyone evaluating AI systems. A benchmark is not just a score. It is a task definition, dataset history, grading convention, leakage risk, prompt format, and community of people who decide whether the result matters. A model evaluation is not just a number. It is an instrument built from old examples and present assumptions. A retrieval system is not just recall. It is an index, a chunking strategy, a ranking policy, and a decision about what sources deserve to be near the answer.
Loukissas gives language for refusing the split between technical and social explanations. When a system succeeds, its success depends on local work. When it fails, the failure is often local too: a category that never fit, a sensor that missed the important event, a transcript that mangled speech, a form that forced people into the wrong field, a dashboard that converted uncertainty into color.
Recursive Reality
The book becomes most relevant when data stop merely describing institutions and begin governing them. A collection makes a place legible. The institution acts on that legibility. People adapt to the action. The adaptation produces new records. The next model treats those records as evidence. The loop then claims the authority of the world it helped make.
That loop is visible across contemporary AI systems. A platform ranks content, creators optimize for the ranking, and the optimized content becomes the platform's evidence of user preference. A school measures learning through machine-readable artifacts, students produce work for the artifacts, and analytics systems summarize the changed behavior as education. A workplace turns labor into tickets, commits, chat logs, keystrokes, and productivity metrics, then trains tools that define good work through the trace left by earlier tools.
All Data Are Local helps slow that loop down. It asks where the data came from, but also where the data returned. Did a model simply learn from a setting, or did it begin to manage the setting? Did an archive preserve local knowledge, or did an interface flatten it into a portable signal? Did a data set support public understanding, or did it become a control surface?
Where the Book Needs Friction
The book predates the current foundation-model boom, large-scale generative AI deployment, data-center politics, synthetic media pipelines, agentic tools, and the regulatory fights around training-data disclosure. Readers should not expect it to answer those questions directly. Its examples are data studies examples, not frontier-model governance cases.
There is also a risk in making "locality" too elastic. If every record has many attachments to many places, the word local can begin to mean context in general. The practical value comes back when the reader asks specific questions: which instrument, which institution, which audience, which interface, which update practice, which field, which excluded person, which future decision?
The book is strongest as an inspection habit, not as a total theory. It does not replace political economy, labor analysis, privacy law, civil-rights enforcement, infrastructure governance, or security engineering. It gives those fields a missing first move: do not let a data set enter the room as if it had no biography.
What This Changes
Read in 2026, All Data Are Local turns into a simple audit question for AI systems: what is the data setting behind this claim?
That question should be asked before procurement, deployment, publication, and appeal. What was collected? Who collected it? Under what authority? For what original purpose? What instruments and interfaces shaped it? What categories did it impose? What local knowledge did it preserve or erase? How was it cleaned? What was joined to it? Who can contest it? What happens when the system acts on it and produces new records?
The book's deeper warning is about humility. Data can travel farther than the conditions that made it meaningful. AI systems accelerate that travel by turning situated records into general answers, predictions, scores, summaries, and actions. Good governance keeps the setting attached. Bad governance lets the interface pretend that the setting never existed.
Sources
- MIT Press, All Data Are Local, publisher record, ebook, hardcover, paperback metadata, ISBNs, publication dates, case-study description, six principles, author note, and open-access link, reviewed June 15, 2026.
- Yanni Alexander Loukissas, "All Data Are Local: Thinking Critically in a Data-Driven Society", author page with overview, open-access and purchase links, and reception excerpts, reviewed June 15, 2026.
- MIT Press Direct, All Data Are Local, open-access edition landing page, reviewed June 15, 2026.
- T. D. Wilson, review of All data are local. Thinking critically in a data-driven society, Information Research 24, no. 2, review no. R663, June 2019, bibliographic metadata, case-study summary, and critical notes, reviewed June 15, 2026.
- WorldCat, All data are local: thinking critically in a data-driven society, OCLC bibliographic record and summary, reviewed June 15, 2026.
Book links are paid affiliate links. As an Amazon Associate I earn from qualifying purchases.