Data Cartels and the Information Monopoly Behind AI
Sarah Lamdan's Data Cartels is not only a privacy book. It is a map of the firms that sit between public knowledge and private dossiers: legal databases, academic publishing platforms, financial information services, news archives, and risk products built from personal data. Read in the age of legal AI, retrieval systems, data brokers, and automated institutions, the book shows that the model layer is often downstream from a quieter monopoly over what can be known, searched, priced, and acted on.
The Book
Data Cartels: The Companies That Control and Monopolize Our Information was published by Stanford University Press in November 2022. The publisher lists the book at 224 pages, with hardcover ISBN 9781503615076, paperback ISBN 9781503633711, and ebook ISBN 9781503633728. Lamdan writes from an unusual crossing of professions: law, librarianship, privacy, open government, and scholarly communication.
The title names the argument. The companies at issue are not only vendors selling research tools or professional databases. They are information conglomerates whose businesses combine access to public knowledge with analytics products, personal data, risk scoring, and institutional decision support. The result is a system where the same corporate families can rent access to law or science while also selling data about the people, institutions, and markets that depend on those resources.
Lamdan's primary targets are RELX, the parent company behind LexisNexis and Elsevier, and Thomson Reuters, the parent company behind Westlaw and other professional information products. Scholarly reviews of the book summarize its cases across legal information, academic research, financial information, news, and data brokering. The pattern is less spectacular than a social network scandal, but more infrastructural: knowledge is enclosed, users are tracked, public records are recombined, and downstream institutions buy what looks like certainty.
That makes the book a useful companion to Data and Goliath, The Digital Person, The Costs of Connection, and The Black Box Society. Those books explain surveillance, dossiers, extraction, and opacity. Lamdan adds the library shelf, the legal search box, the citation index, the paywalled public record, and the analytics vendor that quietly turns them into institutional power.
The Cartel Shape
The strongest idea in Data Cartels is that information monopoly has a shape different from ordinary platform monopoly. Google, Meta, Amazon, Apple, and Microsoft are visible because ordinary people touch their products every day. RELX and Thomson Reuters are less culturally visible, but their systems sit behind lawyers, researchers, police, insurers, landlords, employers, financial professionals, journalists, and public agencies.
This matters because power becomes harder to contest when it is embedded in professional dependency. A lawyer may complain about Westlaw pricing but still need the product to serve clients. A university may dislike journal-package terms but still need access for faculty and students. A public agency may describe a data product as procurement rather than surveillance. A researcher may treat metrics as bad incentives while still living under them.
The cartel frame is not only about price. It is about a fused control over inputs, interfaces, analytics, and downstream action. A firm that controls an archive can decide what is easy to find. A firm that controls usage data can monitor how the archive is used. A firm that sells analytics can convert that archive and usage data into rankings, risk profiles, market intelligence, or policing tools. A firm that also has government customers can make private extraction feel like public administration.
That is why Lamdan's book speaks so directly to machine learning. AI systems do not begin with a model. They begin with access: text, records, case law, filings, citations, identity data, financial signals, behavioral traces, metadata, and the right to process them. Control over information supply is control over what future systems can summarize, retrieve, compare, predict, and automate.
Legal Knowledge as Infrastructure
The legal-information case is the book's cleanest example of public knowledge trapped inside private infrastructure. Law is supposed to bind everyone, so access to law should be a civic baseline. In practice, the usable version of legal knowledge often lives in proprietary databases: search interfaces, citators, annotations, editorial enhancements, historical coverage, docket access, and professional workflows.
This is not a minor inconvenience. If law becomes practically knowable only through expensive platforms, then the public domain is split into formal openness and operational access. The official material may exist somewhere, but the tools that make it usable are rented. Well-funded institutions get search, synthesis, alerts, updates, and context. Everyone else gets slower access, incomplete access, or no realistic access.
The AI problem arrives on top of that. Legal AI products need legal corpora, case metadata, user workflows, citation networks, drafting patterns, docket signals, and professional feedback. If the underlying knowledge layer is already enclosed, then AI does not democratize law by default. It can become another interface sitting over the same concentrated databases, making the old dependency feel like a new assistant.
Yale Law Library's 2025 Critical Legal AI Literacies series treated Lamdan's work exactly this way: as a way to discuss privacy concerns created by data cartels in the development of legal AI. That framing is right. The question is not only whether a legal chatbot hallucinates. It is who owns the corpus, who sees the query, who retains the work product, which users improve the system, and whether access to legal understanding becomes more public or more deeply rented.
Dossiers and Delegated Surveillance
Lamdan's account of data brokering is strongest when it shows how ordinary institutional records become a market for delegated surveillance. Public records, location traces, addresses, property data, criminal records, social media signals, health-related inferences, court data, professional records, and commercial datasets do not remain separate. They are linked, cleaned, scored, packaged, and sold into contexts where people have little visibility or practical consent.
This is a direct continuation of the dossier problem that Daniel Solove described in The Digital Person. The danger is not only that someone knows a private fact. It is that a database version of a person travels into decisions about employment, housing, credit, insurance, policing, child welfare, benefits, immigration, health access, and risk. The dossier can be wrong, stale, decontextualized, or impossible to correct. It can still become operational truth.
Data Cartels sharpens that problem by showing how data markets let governments and other institutions route around the friction that should attach to official power. If an agency buys access to commercial dossiers, surveillance can look like procurement. If a contractor supplies risk analytics, discretion can look like vendor output. If a data broker assembles the profile, the institution can claim it is simply using available information.
In an AI setting, that pattern becomes more dangerous. Models and agents can fuse the dossier with summarization, prioritization, triage, generation, and recommendation. They can make an old record easier to act on and harder to notice. The result is not only surveillance. It is a decision environment where institutional memory has been privatized, automated, and made to look like objective assistance.
Academic Memory and Metrics
The academic-publishing chapters matter because they connect surveillance to knowledge production itself. Many researchers meet companies like Elsevier as journal platforms, citation indexes, databases, and workflow tools. Lamdan asks readers to see them also as analytics firms. The same system that hosts articles can observe reading, citation, collaboration, institutional affiliation, grant activity, and research impact.
That changes what academic infrastructure is. A journal platform is not only a bookshelf. A citation index is not only a map. A metric is not only a neutral count. Each becomes a sensor and ranking system inside the institution that produces knowledge. It can influence hiring, tenure, funding, prestige, research agendas, and what topics appear worth pursuing.
The recursive loop is obvious: scholars produce research; publishers enclose and sell access to it; analytics products track and rank the research ecosystem; universities respond to the rankings; scholars adjust behavior to survive inside the metrics; the platform receives more signals. The system does not simply measure academic reality. It helps produce the academic reality it later measures.
This is why the book belongs beside The Tyranny of Metrics, Trust in Numbers, and Sorting Things Out. It gives a concrete industry case for a general institutional pattern: when measurement systems become infrastructure, they stop being mirrors and become incentives, gates, and memory.
Recursive Reality
Data Cartels is a book about recursive reality without needing that phrase. Information systems observe the world, sell the observation back to institutions, and then shape the next version of the world that can be observed.
A legal database changes how lawyers search and argue. Those arguments affect precedent, settlement, billing, and professional expectations. An academic metric changes which work is rewarded. Those rewards change the research record. A risk product changes how agencies prioritize people. Those priorities generate new records, encounters, denials, alerts, and classifications. A data broker sells a profile. The profile produces decisions that create more data about the profiled person.
The cycle is not always malicious. It is often banal: contracts, dashboards, procurement, licenses, APIs, institution-wide subscriptions, compliance logs, and database defaults. That banality is the point. The most consequential systems do not need to announce themselves as control systems. They can arrive as necessary professional infrastructure.
The book's deeper warning is that public reality can become privately mediated at the level of evidence. If the best law, science, records, metrics, and risk signals are held behind opaque systems, then public debate starts downstream from private curation. People still argue, vote, sue, teach, research, and govern, but the informational ground has already been sorted.
The AI Reading
Read in 2026, Data Cartels looks like a prehistory of AI procurement. Every enterprise wants retrieval, summarization, legal drafting, risk triage, research copilots, compliance monitors, customer scoring, fraud detection, and agentic workflows. Those systems need data access. The firms that already control critical information resources are therefore not just legacy vendors. They are positioned as AI infrastructure providers.
This shifts the AI-governance question. It is not enough to ask whether a model is accurate, biased, explainable, or secure. Those questions matter, but Lamdan pushes the reader one layer lower. Who controls the source material? What data was collected under a professional, civic, or educational relationship and then repurposed? Which public records have become private enrichment material? Which users are forced to feed the system because their institution cannot function without the vendor?
The legal-AI example is especially sharp. If a model is trained on or connected to proprietary legal databases, it may inherit a concentrated information market while presenting itself as a friendly access layer. The same can happen in science, finance, health, insurance, public administration, and education. A chatbot can feel democratizing at the surface while hardening dependency underneath.
The book therefore recommends a more infrastructural AI literacy. Ask where the corpus comes from, who owns the index, who sees the query, who stores the output, who can audit the record, what pricing model governs access, and whether the public receives durable knowledge or merely temporary permission to consult a rented interface.
Where the Book Needs Friction
Data Cartels is forceful and useful, but its breadth can also compress differences among markets. Legal research, academic publishing, news, finance, and personal-data brokering share patterns of consolidation and analytics, but each has different legal constraints, user relationships, public-interest obligations, and possible remedies. A reader should take the cartel frame as a diagnostic lens, not a complete legal taxonomy.
The book is also centered on the United States. That focus is justified by the legal and market examples, but global information power is larger: European privacy law, Chinese data governance, transnational publishing markets, cross-border cloud infrastructure, immigration databases, open-access mandates, and developing-country access to science all complicate the picture. Lamdan gives a strong U.S. map. The map needs international overlays.
There is also a productive tension in the remedy. Treating information as a public good is necessary, but difficult. Public infrastructure can be underfunded, politically captured, inaccessible, badly maintained, or used for surveillance in its own right. The alternative to private cartels cannot be a vague wish for public data. It has to include libraries, courts, public-interest technology, privacy law, antitrust, procurement rules, open standards, independent audits, and real budgets for maintenance.
Those limits do not weaken the book. They clarify its use. It is best read as an alarm and a map: here are the firms, markets, incentives, and professional dependencies that make information monopoly feel normal. The next step is to design institutions strong enough to make public knowledge usable without turning everyone who seeks it into another data source.
What This Changes
The practical lesson is to audit the information vendor before auditing only the model.
For legal AI, research copilots, public-sector analytics, academic metrics, workplace scoring, fraud systems, and institutional agents, ask what information monopoly sits underneath the product. Does the vendor control the archive, the search interface, the analytics layer, and the usage data? Does it also sell personal dossiers or risk products? Are public records being turned into private scoring infrastructure? Can users inspect, correct, export, or delete what the system knows about them?
For public institutions, the question is not only procurement efficiency. It is whether a contract hands civic memory to a firm whose incentives are incompatible with public access. A court, school, library, agency, university, newsroom, or research funder that rents its knowledge layer may later discover that it also rented out its future choices.
Data Cartels matters because it refuses the comforting split between content and surveillance. The same pipeline can deliver law, sell analytics, track researchers, package dossiers, rank institutions, and feed AI systems. Once those functions are fused, the public does not only need better privacy settings. It needs a politics of information infrastructure.
Sources
- Stanford University Press, Data Cartels: The Companies That Control and Monopolize Our Information, publisher record, publication date, page count, ISBNs, description, reviews, and author note, reviewed June 14, 2026.
- Sarah Lamdan, Data Cartels, author book page with publisher and publication context, reviewed June 14, 2026.
- SPARC, "Sarah Lamdan Discusses her New Book, Data Cartels", November 2022 webinar recap and interview context, reviewed June 14, 2026.
- WIRED, Sarah Lamdan, "The Quiet Invasion of 'Big Information'", November 9, 2022, adapted excerpt from Data Cartels, reviewed June 14, 2026.
- Yale Law Library, "Critical Legal AI Literacies: Sarah Lamdan on 'Data Cartels: The Companies that Control Legal AI'", December 9, 2025, legal-AI discussion context, reviewed June 14, 2026.
- Sue Curry Jansen, review of Data Cartels, International Journal of Communication 17, 2023, reviewed June 14, 2026.
- Tim Ribaric, "Book Review: Data Cartels", Canadian Journal of Academic Librarianship 9, 2023, reviewed June 14, 2026.
- Brian Martin, review of Sarah Lamdan's Data Cartels, Prometheus: Critical Studies in Innovation 40, no. 1, 2024, reviewed June 14, 2026.
Book links are paid affiliate links. As an Amazon Associate I earn from qualifying purchases.
- Amazon, Data Cartels by Sarah Lamdan.