Blog · Review Essay · May 2026

Data and Goliath and the Data Dragnet

Bruce Schneier's Data and Goliath was written for the post-Snowden privacy crisis, but its deepest argument has aged into the AI era: surveillance is not just watching. It is an institutional habit of turning ordinary life into usable data, then letting states and companies build power from the resulting asymmetry.

The Book

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World was published by W. W. Norton & Company in March 2015. Schneier's official book page lists the hardback at 320 pages, with hardcover ISBN 978-0393244816 and paperback ISBN 978-0393352177. The book became a New York Times bestseller and sits in the immediate aftermath of the Snowden disclosures, when the public had new evidence of how far intelligence agencies, telecommunications systems, platforms, and device ecosystems had already moved into everyday life.

Schneier's argument is broader than a complaint about one agency or one platform. He describes a world in which phones, payment systems, email, search, social media, cars, location services, loyalty programs, workplace systems, and commercial databases generate continuous records of conduct. Those records are useful to advertisers, fraud teams, police, intelligence agencies, data brokers, insurers, employers, and political actors. Privacy becomes difficult not because one person made a bad setting choice, but because the environment has been built to make data exhaust normal.

The book is divided around three practical questions: what world is being created, what is at stake, and what should be done about it. That structure is one reason it remains useful. It moves from diagnosis to institutional remedies instead of treating surveillance as a matter of individual virtue.

The Dragnet Model

The strongest part of Data and Goliath is its explanation of dragnet surveillance. Targeted investigation begins with a concrete reason to look. Dragnet surveillance begins by collecting broadly, storing first, and searching later. In the digital world, that shift is easy to hide because collection often looks like ordinary service operation: logs, backups, analytics, telemetry, ad targeting, security monitoring, recommendation signals, and customer records.

Once collection becomes normal, the burden moves to the person being watched. The user must understand invisible systems, read terms, choose settings, install tools, avoid platforms, resist convenience, and accept social penalties for opting out. That is not a meaningful bargain. It asks individuals to solve an infrastructure problem from inside the infrastructure.

Schneier is especially useful on the security tradeoff. Surveillance is often defended as protection, but broad collection can weaken security by creating valuable targets, encouraging backdoors, normalizing secrecy, and concentrating power in systems that attackers also want to exploit. A database built for safety can become a breach, a leak, a tool of discrimination, or a future authoritarian resource. The harm is not only that someone is looking now. The harm is that the record exists for later use by actors and regimes that may not yet be in place.

Public and Private Watchers

The book's most durable political insight is that government surveillance and corporate surveillance are not separate worlds. States want access to private-sector data. Companies want state-granted stability, contracts, legal immunity, infrastructure, identity systems, and market permission. The result is an exchange relationship: corporate data collection builds dossiers at scale, and government power gives those dossiers coercive consequence.

This matters because many privacy debates mislocate the problem. If the villain is only the state, then consumer platforms appear as private refuges. If the villain is only surveillance capitalism, then public agencies appear as neutral regulators. Schneier's frame is more uncomfortable: the same data can move among commerce, policing, intelligence, national security, litigation, employment, welfare administration, and political persuasion.

The practical effect is a new kind of institutional legibility. People are not only citizens before the state or customers before companies. They become profiles that can be repurposed. A location history can be convenience data, ad data, evidence, immigration data, protest data, workplace data, or insurance data. A social graph can be friendship, influence, risk, suspicion, or targeting substrate. The meaning of the record changes with the institution that gets to use it.

The AI-Age Reading

Read in 2026, Data and Goliath looks less like a finished account of privacy and more like a prehistory of AI data power. The book describes collection, storage, correlation, and profiling. The current AI stack adds inference, synthesis, memory, automated judgment, natural-language interfaces, and agentic action.

That addition changes the stakes. Data is no longer valuable only because it tells an institution what happened. It is valuable because it can train systems, personalize persuasion, predict vulnerability, simulate users, rank workers, summarize cases, generate risk narratives, and steer agents. The old dragnet made people searchable. The AI-era dragnet makes them modelable.

Large models also change the interface of surveillance. A person may not experience a system as surveillance at all. They may experience it as helpful memory, a convenient assistant, a fraud-prevention check, a personalized feed, an automated eligibility screen, a hiring platform, a classroom tool, a therapeutic chatbot, or a customer-service system. The interface can feel intimate while the underlying arrangement remains extractive.

This is where Schneier's insistence on power is still clarifying. The question is not only whether a model is accurate. It is who gathers the data, who sets the retention rules, who can combine datasets, who can compel access, who can infer sensitive facts, who can contest a generated classification, and who profits from turning life into prediction.

Where the Book Needs Updating

Data and Goliath is a 2015 book, and it shows. It predates the mainstream explosion of large language models, the current AI-agent boom, deepfake politics, modern data-center scale, app-store identity consolidation, the mature data-broker ecosystem of the 2020s, and the current wave of AI regulation. American University's 2022 review notes that the book had already become dated in some technical respects and that its policy frame is strongly U.S.-focused.

Those limits matter. Surveillance governance now has to account for biometric systems, cross-border cloud dependencies, synthetic media, model training, public-sector AI procurement, recommender systems, workplace analytics, school platforms, and the ability of models to infer sensitive traits from fragments. A privacy politics built only around 2015-era collection will miss how generated outputs and automated actions now complete the loop.

Even so, the book's age can be useful. It prevents the AI debate from pretending that extraction began with generative models. AI companies inherited a web already trained to collect, log, monetize, retain, and normalize personal data. The new systems are powerful partly because the social permission structure was built earlier.

The Site Reading

The recurring pattern across surveillance, platform governance, automated welfare, algorithmic scoring, and AI companions is the same: a system first makes people legible, then uses that legibility to shape the choices available to them. The harm is not always dramatic. Often it arrives as convenience, personalization, risk management, security, optimization, or care.

Data and Goliath is useful because it refuses the individualization of the problem. Better passwords, encrypted apps, browser settings, and privacy hygiene are worth having, but they cannot substitute for public rules about collection, retention, secondary use, compelled access, auditing, liability, and institutional accountability. Personal discipline is not enough when the business model and the state both benefit from the record.

The book also clarifies why AI governance needs data minimization at its center. A model-mediated society cannot treat every trace as legitimate raw material. Some data should not be collected. Some should not be retained. Some should not be reused for training. Some should not be fused with other datasets. Some should remain local, ephemeral, encrypted, or unavailable to institutions that would turn it into leverage.

The practical lesson is blunt: privacy is not nostalgia for a pre-digital self. It is a condition for agency, dissent, experimentation, repair, and unscored life. Without it, intelligence becomes easier to build, but human freedom becomes easier to administer.

Sources

Book links are paid affiliate links. As an Amazon Associate I earn from qualifying purchases.


Return to Blog · Return to Books