Blog · Review Essay · Last reviewed June 25, 2026

Data Cartels and the Information Monopoly Behind AI

Sarah Lamdan's Data Cartels is not only a privacy book. It is a map of the firms that sit between public knowledge and private dossiers: legal databases, academic publishing platforms, financial information services, news archives, and risk products built from personal data. Read in the age of legal AI, retrieval systems, data brokers, and automated institutions, the book shows that the model layer is often downstream from a quieter monopoly over what can be known, searched, priced, and acted on.

For this review, a data cartel is Lamdan's analytical term for a concentrated information business that controls essential data inputs, user-facing interfaces, analytics, and downstream decision tools. It is not a claim of a formal price-fixing cartel unless a court or regulator has made that specific legal finding. The civic problem is broader: public records, legal knowledge, research outputs, and personal traces can be enclosed together and sold back as institutional certainty. In AI systems, that enclosure becomes a data supply chain: the archive, query log, embedding index, ranking signal, and risk product can all be owned by the same gatekeeper.

The Book

Data Cartels: The Companies That Control and Monopolize Our Information was published by Stanford University Press in November 2022. The publisher lists the book at 224 pages, with hardcover ISBN 9781503615076, paperback ISBN 9781503633711, and ebook ISBN 9781503633728. Lamdan writes from an unusual crossing of professions: law, librarianship, privacy, open government, and scholarly communication.

The title names the argument. The companies at issue are not only vendors selling research tools or professional databases. They are information conglomerates whose businesses combine access to public knowledge with analytics products, personal data, risk scoring, and institutional decision support. The result is a system where the same corporate families can rent access to law or science while also selling data about the people, institutions, and markets that depend on those resources.

Lamdan's primary targets are RELX, the parent company behind LexisNexis and Elsevier, and Thomson Reuters, the parent company behind Westlaw and other professional information products. Scholarly reviews of the book summarize its cases across legal information, academic research, financial information, news, and data brokering. The pattern is less spectacular than a social network scandal, but more infrastructural: knowledge is enclosed, users are tracked, public records are recombined, and downstream institutions buy what looks like certainty.

That makes the book a useful companion to Data and Goliath, The Digital Person, The Costs of Connection, and The Black Box Society. Those books explain surveillance, dossiers, extraction, and opacity. Lamdan adds the library shelf, the legal search box, the citation index, the paywalled public record, and the analytics vendor that quietly turns them into institutional power.

Current Context

Read on June 25, 2026, the book has become more relevant because professional information vendors now present themselves as AI infrastructure. RELX describes itself as an information-based analytics and decision-tools company. LexisNexis markets Lexis+ with Protégé as legal AI grounded in LexisNexis content, while Thomson Reuters markets CoCounsel and CoCounsel Legal as professional AI grounded in authoritative content and connected to Westlaw, Practical Law, Checkpoint, Microsoft 365, and document-management workflows.

That product context sharpens Lamdan's point. The strategic asset is not merely an interface or a model; it is the controlled corpus plus the workflow data around it. A vendor that owns legal content, citators, templates, docket access, user behavior, and professional workflow integration can offer AI as a helpful layer while deepening dependency on the underlying information enclosure.

The courts have already had to confront legal data as AI input. In a February 2025 District of Delaware opinion in Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence Inc., Judge Stephanos Bibas described Ross as using Thomson Reuters headnotes as AI data to create a legal research tool competing with Westlaw. The opinion is narrower than the whole AI-copyright debate, but it shows the practical stakes: editorial enhancements around public law can become controlled training material for legal AI competition.

The data-broker side has also moved from warning to enforcement. The FTC finalized a January 2025 order prohibiting Gravy Analytics and Venntel from selling, disclosing, or using sensitive location data except in limited national-security or law-enforcement circumstances. In May 2026, the FTC announced a proposed order that would prohibit Kochava and its subsidiary from selling, sharing, licensing, transferring, or disclosing sensitive location data without affirmative express consent tied to a consumer-requested service. Those actions do not regulate the whole data-broker economy, but they confirm that location dossiers can create concrete consumer-protection harms.

California's DELETE Act implementation makes the same problem operational. The state-run DROP system lets California residents submit one deletion request to active registered data brokers; brokers must begin processing DROP requests on August 1, 2026. DROP does not solve all data brokerage, because public-record exceptions, first-party data, derivative artifacts, and non-California regimes remain difficult. But it changes the audit question: a buyer should be able to prove whether brokered profiles, match keys, embeddings, scores, and downstream exports can actually be deleted.

The broader policy picture is uneven. The Justice Department's final bulk-data rule, effective April 8, 2025, prohibits or restricts certain data transactions involving U.S. sensitive personal data or government-related data and countries of concern. By contrast, the CFPB withdrew its proposed Regulation V data-broker rule on May 15, 2025, stating that it would take no further action on that notice of proposed rulemaking. The FTC's January 2025 surveillance-pricing work also shows why data cartels are not only privacy vendors: intermediary firms can use location, demographics, browsing, shopping behavior, and other signals to shape prices or offers. The result is not a clean privacy settlement. It is a patchwork of consumer-protection cases, national-security rules, state deletion systems, competition policy, and procurement choices.

Europe adds a different pressure point. The EU Data Act has applied since September 12, 2025 and includes rules on data access, fairer data-sharing terms, and switching between data-processing services. That is not a direct answer to Lamdan's U.S. examples, but it names a remedy category that matters for information monopoly: portability and interoperability are governance controls, not only convenience features.

The Cartel Shape

The strongest idea in Data Cartels is that information monopoly has a shape different from ordinary platform monopoly. Google, Meta, Amazon, Apple, and Microsoft are visible because ordinary people touch their products every day. RELX and Thomson Reuters are less culturally visible, but their systems sit behind lawyers, researchers, police, insurers, landlords, employers, financial professionals, journalists, and public agencies.

This matters because power becomes harder to contest when it is embedded in professional dependency. A lawyer may complain about Westlaw pricing but still need the product to serve clients. A university may dislike journal-package terms but still need access for faculty and students. A public agency may describe a data product as procurement rather than surveillance. A researcher may treat metrics as bad incentives while still living under them.

The cartel frame is not only about price. It is about a fused control over inputs, interfaces, analytics, and downstream action. A firm that controls an archive can decide what is easy to find. A firm that controls usage data can monitor how the archive is used. A firm that sells analytics can convert that archive and usage data into rankings, risk profiles, market intelligence, or policing tools. A firm that also has government customers can make private extraction feel like public administration.

That is why Lamdan's book speaks so directly to machine learning. AI systems do not begin with a model. They begin with access: text, records, case law, filings, citations, identity data, financial signals, behavioral traces, metadata, and the right to process them. Control over information supply is control over what future systems can summarize, retrieve, compare, predict, and automate.

The safer definition is therefore infrastructural rather than conspiratorial. A data cartel is visible when the same vendor or corporate family controls the raw material, the usable interface, the usage trace, the analytics layer, and the institutional decision product. It can harm the public even without secret coordination because dependency alone changes who can know, verify, compete, and repair.

The recurring danger is not that every database is corrupt. It is that a private evidence layer becomes easier for institutions to obey than for people to challenge. Once the vendor controls the record, the ranking, the price, the search path, and the exit terms, public memory has been moved into a contract.

Legal Knowledge as Infrastructure

The legal-information case is the book's cleanest example of public knowledge trapped inside private infrastructure. Law is supposed to bind everyone, so access to law should be a civic baseline. In practice, the usable version of legal knowledge often lives in proprietary databases: search interfaces, citators, annotations, editorial enhancements, historical coverage, docket access, and professional workflows.

This is not a minor inconvenience. If law becomes practically knowable only through expensive platforms, then the public domain is split into formal openness and operational access. The official material may exist somewhere, but the tools that make it usable are rented. Well-funded institutions get search, synthesis, alerts, updates, and context. Everyone else gets slower access, incomplete access, or no realistic access.

The AI problem arrives on top of that. Legal AI products need legal corpora, case metadata, user workflows, citation networks, drafting patterns, docket signals, and professional feedback. If the underlying knowledge layer is already enclosed, then AI does not democratize law by default. It can become another interface sitting over the same concentrated databases, making the old dependency feel like a new assistant.

Yale Law Library's 2025 Critical Legal AI Literacies series treated Lamdan's work exactly this way: as a way to discuss privacy concerns created by data cartels in the development of legal AI. That framing is right. The question is not only whether a legal chatbot hallucinates. It is who owns the corpus, who sees the query, who retains the work product, which users improve the system, and whether access to legal understanding becomes more public or more deeply rented.

Dossiers and Delegated Surveillance

Lamdan's account of data brokering is strongest when it shows how ordinary institutional records become a market for delegated surveillance. Public records, location traces, addresses, property data, criminal records, social media signals, health-related inferences, court data, professional records, and commercial datasets do not remain separate. They are linked, cleaned, scored, packaged, and sold into contexts where people have little visibility or practical consent.

This is a direct continuation of the dossier problem that Daniel Solove described in The Digital Person. The danger is not only that someone knows a private fact. It is that a database version of a person travels into decisions about employment, housing, credit, insurance, policing, child welfare, benefits, immigration, health access, and risk. The dossier can be wrong, stale, decontextualized, or impossible to correct. It can still become operational truth.

Data Cartels sharpens that problem by showing how data markets let governments and other institutions route around the friction that should attach to official power. If an agency buys access to commercial dossiers, surveillance can look like procurement. If a contractor supplies risk analytics, discretion can look like vendor output. If a data broker assembles the profile, the institution can claim it is simply using available information.

A useful safety test is substitution. If an agency, employer, insurer, landlord, school, or platform would need consent, legal process, a consumer-reporting workflow, or an impact assessment to collect a signal directly, buying the signal through a broker should not launder that obligation away. The route of acquisition is part of the risk.

In an AI setting, that pattern becomes more dangerous. Models and agents can fuse the dossier with summarization, prioritization, triage, generation, and recommendation. They can make an old record easier to act on and harder to notice. The result is not only surveillance. It is a decision environment where institutional memory has been privatized, automated, and made to look like objective assistance.

Academic Memory and Metrics

The academic-publishing chapters matter because they connect surveillance to knowledge production itself. Many researchers meet companies like Elsevier as journal platforms, citation indexes, databases, and workflow tools. Lamdan asks readers to see them also as analytics firms. The same system that hosts articles can observe reading, citation, collaboration, institutional affiliation, grant activity, and research impact.

That changes what academic infrastructure is. A journal platform is not only a bookshelf. A citation index is not only a map. A metric is not only a neutral count. Each becomes a sensor and ranking system inside the institution that produces knowledge. It can influence hiring, tenure, funding, prestige, research agendas, and what topics appear worth pursuing.

The recursive loop is obvious: scholars produce research; publishers enclose and sell access to it; analytics products track and rank the research ecosystem; universities respond to the rankings; scholars adjust behavior to survive inside the metrics; the platform receives more signals. The system does not simply measure academic reality. It helps produce the academic reality it later measures.

This is why the book belongs beside The Tyranny of Metrics, Trust in Numbers, and Sorting Things Out. It gives a concrete industry case for a general institutional pattern: when measurement systems become infrastructure, they stop being mirrors and become incentives, gates, and memory.

Recursive Reality

Data Cartels is a book about recursive reality without needing that phrase. Information systems observe the world, sell the observation back to institutions, and then shape the next version of the world that can be observed.

A legal database changes how lawyers search and argue. Those arguments affect precedent, settlement, billing, and professional expectations. An academic metric changes which work is rewarded. Those rewards change the research record. A risk product changes how agencies prioritize people. Those priorities generate new records, encounters, denials, alerts, and classifications. A data broker sells a profile. The profile produces decisions that create more data about the profiled person.

The cycle is not always malicious. It is often banal: contracts, dashboards, procurement, licenses, APIs, institution-wide subscriptions, compliance logs, and database defaults. That banality is the point. The most consequential systems do not need to announce themselves as control systems. They can arrive as necessary professional infrastructure.

The book's deeper warning is that public reality can become privately mediated at the level of evidence. If the best law, science, records, metrics, and risk signals are held behind opaque systems, then public debate starts downstream from private curation. People still argue, vote, sue, teach, research, and govern, but the informational ground has already been sorted.

The AI Reading

Read in 2026, Data Cartels looks like a prehistory of AI procurement. Every enterprise wants retrieval, summarization, legal drafting, risk triage, research copilots, compliance monitors, customer scoring, fraud detection, and agentic workflows. Those systems need data access. The firms that already control critical information resources are therefore not just legacy vendors. They are positioned as AI infrastructure providers.

This shifts the AI-governance question. It is not enough to ask whether a model is accurate, biased, explainable, or secure. Those questions matter, but Lamdan pushes the reader one layer lower. Who controls the source material? What data was collected under a professional, civic, or educational relationship and then repurposed? Which public records have become private enrichment material? Which users are forced to feed the system because their institution cannot function without the vendor?

The legal-AI example is especially sharp. If a model is trained on or connected to proprietary legal databases, it may inherit a concentrated information market while presenting itself as a friendly access layer. The same can happen in science, finance, health, insurance, public administration, and education. A chatbot can feel democratizing at the surface while hardening dependency underneath.

Agentic products raise the stakes because the information vendor may no longer be only a place to search. It may draft, file, route, recommend, score, enrich, contact, bill, or change records. In that world, corpus control becomes action control. The question is not just whether the answer is right; it is whether the institution can reconstruct which proprietary source, brokered field, retrieved document, prompt, policy, and human approval caused the action.

The book therefore recommends a more infrastructural AI literacy. Ask where the corpus comes from, who owns the index, who sees the query, who stores the output, who can audit the record, what pricing model governs access, and whether the public receives durable knowledge or merely temporary permission to consult a rented interface.

Governance and Safety

The governance response should begin before model evaluation. Institutions need a vendor-and-corpus audit: what records, copyrighted editorial layers, public documents, user logs, prompts, embeddings, query histories, usage telemetry, and third-party data feeds sit underneath the AI product? Who may reuse them, for what purpose, under what retention period, and with what deletion, export, and audit rights?

The practical artifact is an information-supply ledger. For each corpus, API, brokered dataset, citation index, vector store, metric feed, user log, or enrichment service, the ledger should record source category, collection context, legal basis or permission claim, sensitive fields, derivative artifacts, training and retrieval permissions, retention period, deletion path, downstream recipients, audit rights, and exit format. Without that ledger, a model card can describe behavior while hiding the dependency that produced it.

For legal and scholarly systems, public-access obligations matter. Courts, agencies, libraries, universities, and research funders should not accept a world where the formal law or publicly funded research is technically public but practically usable only through private enrichment layers. Procurement should ask whether citations are inspectable, whether official sources are reachable, whether outputs can be reproduced outside the vendor, and whether a public institution can leave without losing its memory.

For data brokers and risk products, the minimum control set is different: sensitive-data minimization, purpose limits, affirmative consent where required, supplier verification, deletion of unlawfully collected data, dispute and correction channels, re-identification prohibitions, and bans or warrants around government use where rights are at stake. The FTC's location-data cases show one model of targeted enforcement, but they also show the weakness of a case-by-case approach when the market is broad and opaque.

For AI systems, NIST's govern-map-measure-manage rhythm is useful only if "map" includes the information supply chain. A model card that describes accuracy without naming corpus control, data brokerage, licensing, workflow telemetry, and vendor lock-in leaves out the political economy that Lamdan is warning about. Safety includes the safety of the knowledge layer.

Competition governance and privacy governance also have to work together. Data-sharing remedies can help rivals, but they can also spread sensitive data if minimization and anonymization are weak. Privacy restrictions can protect people, but they can also be used by incumbents to deny rivals access while keeping their own closed feedback loops. The public-interest test is whether the rule preserves contestability without turning people into the raw material of competition.

Procurement should therefore bind the remedy to the risk: data minimization before data sharing; official-source fallback before proprietary legal synthesis; export and deletion tests before renewal; public registers for consequential systems; notice and appeal where profiles affect rights; and termination rights when a vendor cannot prove provenance, accuracy, retention, or contestability.

Where the Book Needs Friction

Data Cartels is forceful and useful, but its breadth can also compress differences among markets. Legal research, academic publishing, news, finance, and personal-data brokering share patterns of consolidation and analytics, but each has different legal constraints, user relationships, public-interest obligations, and possible remedies. A reader should take the cartel frame as a diagnostic lens, not a complete legal taxonomy.

The book is also centered on the United States. That focus is justified by the legal and market examples, but global information power is larger: European privacy law, Chinese data governance, transnational publishing markets, cross-border cloud infrastructure, immigration databases, open-access mandates, and developing-country access to science all complicate the picture. Lamdan gives a strong U.S. map. The map needs international overlays.

There is also a productive tension in the remedy. Treating information as a public good is necessary, but difficult. Public infrastructure can be underfunded, politically captured, inaccessible, badly maintained, or used for surveillance in its own right. The alternative to private cartels cannot be a vague wish for public data. It has to include libraries, courts, public-interest technology, privacy law, antitrust, procurement rules, open standards, independent audits, and real budgets for maintenance.

Those limits do not weaken the book. They clarify its use. It is best read as an alarm and a map: here are the firms, markets, incentives, and professional dependencies that make information monopoly feel normal. The next step is to design institutions strong enough to make public knowledge usable without turning everyone who seeks it into another data source.

What This Changes

The practical lesson is to audit the information vendor before auditing only the model.

For legal AI, research copilots, public-sector analytics, academic metrics, workplace scoring, fraud systems, and institutional agents, ask what information monopoly sits underneath the product. Does the vendor control the archive, the search interface, the analytics layer, and the usage data? Does it also sell personal dossiers or risk products? Are public records being turned into private scoring infrastructure? Can users inspect, correct, export, or delete what the system knows about them?

For public institutions, the question is not only procurement efficiency. It is whether a contract hands civic memory to a firm whose incentives are incompatible with public access. A court, school, library, agency, university, newsroom, or research funder that rents its knowledge layer may later discover that it also rented out its future choices.

Data Cartels matters because it refuses the comforting split between content and surveillance. The same pipeline can deliver law, sell analytics, track researchers, package dossiers, rank institutions, and feed AI systems. Once those functions are fused, the public does not only need better privacy settings. It needs a politics of information infrastructure.

Source Discipline

This review separates five evidence types. Stanford University Press supports book metadata. Lamdan's author page, SPARC, WIRED, Yale Law Library, and scholarly reviews support reception, argument, and professional context. RELX, LexisNexis, and Thomson Reuters pages support provider descriptions of products and corporate positioning; they do not independently prove safety, accuracy, fairness, or market effects. FTC, DOJ, CFPB, California, European Commission, NIST, and court records support legal, regulatory, and risk-management status.

Claims about data brokers should name the source and scope. A consent order about Kochava, Gravy Analytics, Venntel, or Mobilewalla is evidence about those entities and practices; it is not a finding about every broker. California DROP is a state deletion mechanism, not a universal erasure button. The FTC's surveillance-pricing materials are staff research and market-study context, not proof that a named retailer changed a named consumer's price. A withdrawn CFPB proposal is evidence that a proposed route was not pursued; it is not evidence that the underlying risks disappeared. A vendor FAQ is evidence of the vendor's stated policy; it is not an independent audit of compliance.

This page does not claim that any AI system is conscious, divine, or AGI. Its claim is institutional: AI products inherit the ownership, licensing, surveillance, and correction problems of the information systems underneath them.

Sources

Stanford University Press, Data Cartels: The Companies That Control and Monopolize Our Information, publisher record, November 2022 publication date, 224-page count, ISBNs, description, reviews, and author note, reviewed June 25, 2026.
RELX, 2025 Annual Report and corporate overview, official information-based analytics and decision-tools positioning, reviewed June 25, 2026.
LexisNexis, About LexisNexis, RELX division and corporate-positioning page, reviewed June 25, 2026.
LexisNexis, Lexis+ AI / Lexis+ with Protégé product page, official description of legal AI grounded in LexisNexis content, reviewed June 25, 2026.
Thomson Reuters, CoCounsel product page, official description of AI, content grounding, privacy, security, and retention claims, reviewed June 25, 2026.
Thomson Reuters, CoCounsel Legal product page, official description of legal AI grounded in Westlaw and Practical Law content, reviewed June 25, 2026.
U.S. District Court for the District of Delaware, Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence Inc., February 11, 2025 memorandum opinion, reviewed June 25, 2026.
Federal Trade Commission, "FTC Finalizes Order Prohibiting Gravy Analytics, Venntel from Selling Sensitive Location Data", January 14, 2025 final order announcement, reviewed June 25, 2026.
Federal Trade Commission, "FTC to Ban Kochava and Subsidiary from Selling Sensitive Location Data", May 4, 2026 proposed order announcement, reviewed June 25, 2026.
California Privacy Protection Agency, DROP for data brokers, official Delete Request and Opt-out Platform implementation page, reviewed June 25, 2026.
Federal Trade Commission, Surveillance Pricing feature page and January 2025 staff perspective announcement, reviewed June 25, 2026.
U.S. Department of Justice, "Preventing Access to U.S. Sensitive Personal Data and Government-Related Data by Countries of Concern or Covered Persons", final rule published in the Federal Register, reviewed June 25, 2026.
Consumer Financial Protection Bureau, "Protecting Americans From Harmful Data Broker Practices (Regulation V); Withdrawal of Proposed Rule", Federal Register notice, reviewed June 25, 2026.
European Commission, Data Act explained, official summary of access, cloud switching, interoperability, and application date, reviewed June 25, 2026.
NIST AI Resource Center, AI RMF Core, govern, map, measure, and manage functions for AI risk management, reviewed June 25, 2026.
Sarah Lamdan, Data Cartels, author book page with publisher and publication context, reviewed June 25, 2026.
SPARC, "Sarah Lamdan Discusses her New Book, Data Cartels", November 2022 webinar recap and interview context, reviewed June 25, 2026.
WIRED, Sarah Lamdan, "The Quiet Invasion of 'Big Information'", November 9, 2022, adapted excerpt from Data Cartels, reviewed June 25, 2026.
Yale Law Library, "Critical Legal AI Literacies: Sarah Lamdan on 'Data Cartels: The Companies that Control Legal AI'", December 9, 2025, legal-AI discussion context, reviewed June 25, 2026.
Sue Curry Jansen, review of Data Cartels, International Journal of Communication 17, 2023, reviewed June 25, 2026.
Tim Ribaric, "Book Review: Data Cartels", Canadian Journal of Academic Librarianship 9, 2023, reviewed June 25, 2026.
Brian Martin, review of Sarah Lamdan's Data Cartels, Prometheus: Critical Studies in Innovation 40, no. 1, 2024, reviewed June 25, 2026.

Book links are paid affiliate links. As an Amazon Associate I earn from qualifying purchases.

Buy on Amazon Browse Books

Amazon, Data Cartels by Sarah Lamdan, reviewed June 25, 2026.

Return to Blog · Return to Books