Data and Goliath and the Data Dragnet
Bruce Schneier's Data and Goliath was written for the post-Snowden privacy crisis, but its strongest argument has become more useful in the AI era: a data dragnet is not just a large database. It is a lifecycle that collects broadly, stores beyond the immediate transaction, links across contexts, and lets institutions recombine ordinary life into profiles, scores, predictions, and leverage.
For this review, surveillance means persistent collection, retention, linkage, inference, and institutional use of traces about people, whether the watcher is a state agency, platform, broker, employer, school, insurer, advertiser, or model provider. The harm is not only being seen. It is being made governable through records that travel farther than the original encounter.
The governance test is therefore simple: can an institution prove why a trace was collected, where it moved, what was derived from it, who can use it, when it expires, and how the person affected can contest or erase the record?
The Book
Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World was published by W. W. Norton & Company in March 2015. Schneier's official book page lists the book at 320 pages, identifies it as a New York Times bestseller, gives the hardcover ISBN as 9780393244816 and the paperback ISBN as 9780393352177, and frames the book around the shared surveillance power of corporations and governments.
The book sits in the immediate aftermath of the Snowden disclosures, but it is not only a book about the NSA. Schneier describes a world in which phones, payment systems, email, search, social media, cars, location services, loyalty programs, workplace tools, ad exchanges, security logs, and commercial databases generate continuous records of conduct. Those records become useful to advertisers, fraud teams, police, intelligence agencies, data brokers, insurers, employers, political actors, and any vendor that can turn behavior into a product.
That is why the book remains relevant beside The Digital Person, Liquid Surveillance, The Age of Surveillance Capitalism, and Data Cartels. It gives a security engineer's vocabulary for a cultural and institutional problem: once daily life is made machine-readable, privacy cannot be protected by asking each person to outsmart every service, broker, sensor, agency, and contract.
Current Context
As of June 25, 2026, Schneier's public-private thesis is easier to verify than it was in 2015. The FTC's 2014 data-broker report remains a baseline statement of limited consumer visibility; later FTC orders and proposed orders around sensitive location data show the same market operating through mobile SDKs, ad exchanges, data brokers, and government or commercial buyers. The FTC's 2024 report on major social media and video streaming services adds the platform layer: broad collection, retention, youth-safeguard, advertising, algorithmic, and AI concerns now sit inside the same surveillance economy.
Official security policy has also absorbed the lesson. ODNI's 2024 commercially available information framework treats purchased data as a privacy and civil-liberties problem for intelligence agencies. DOJ's Data Security Program, effective April 8, 2025, treats bulk genomic, geolocation, biometric, health, financial, government-related, and other sensitive personal data as national-security infrastructure when accessible to countries of concern or covered persons. These measures are partial and jurisdiction-bound, but they confirm the core pattern: commercial traces can become state power, foreign-access risk, AI-development fuel, and coercive leverage.
The Dragnet Model
The central concept is dragnet surveillance. Targeted investigation begins with a reason to look. Dragnet surveillance reverses the order: collect first, retain broadly, search later, and justify the collection by pointing to possible future usefulness. Digital systems make that reversal easy to normalize because collection often appears as ordinary operation: telemetry, logs, backups, location history, ad targeting, recommendation signals, fraud detection, workplace analytics, support records, and customer data.
A dragnet is therefore not defined by one camera or one database. It is defined by a lifecycle. Data is gathered outside a specific suspicion, kept beyond the immediate transaction, made searchable or linkable, and exposed to secondary users whose purposes were not visible at collection. A phone location trace can become navigation support, ad inventory, evidence, immigration intelligence, protest monitoring, workplace proof, insurance signal, or brokered commercial product. The meaning of the record changes when the institution changes.
Metadata is central to that lifecycle. A message body may be encrypted, but timing, location, sender, recipient, device, account, payment, app, and network patterns can still reveal social relationships, routines, pressure points, and anomalies. The dragnet does not need to know everything about a person to make the person administratively legible. It only needs enough linked traces to let another system infer what matters.
Once that lifecycle becomes normal, the burden shifts to the watched person. The user must understand invisible systems, read terms, choose settings, install tools, avoid platforms, resist convenience, and accept social penalties for opacity. That is not a meaningful bargain. It asks individuals to solve an infrastructure problem from inside the infrastructure.
Schneier is especially strong on the security tradeoff. Broad collection is often defended as protection, but a retained dataset is also a target, a breach surface, a subpoena surface, a temptation for mission creep, and a future authoritarian resource. The harm is not only that someone is looking now. The harm is that a record exists for later actors, future regimes, breached vendors, model trainers, and automated decision systems that the subject never faced when the trace was produced.
Public and Private Watchers
The book's most durable political insight is that government surveillance and corporate surveillance are not separate worlds. States want access to private-sector data. Companies want state-granted stability, contracts, legal immunity, infrastructure, identity systems, and market permission. The result is an exchange relationship: corporate collection builds dossiers at scale, and government power can give those dossiers coercive consequence.
This matters because privacy debates often mislocate the problem. If the only danger is the state, then consumer platforms look like private refuge. If the only danger is surveillance capitalism, public agencies look like neutral regulators. Schneier's frame is more uncomfortable: the same data can move among commerce, policing, intelligence, national security, litigation, employment, welfare administration, immigration, schooling, and political persuasion.
The current data-broker record confirms that this is not only a 2015 concern. The FTC's 2014 data-broker report described a market with limited consumer visibility. Since then, the agency has brought location-data cases against X-Mode/Outlogic, Gravy Analytics/Venntel, and Mobilewalla, including orders or proposed orders restricting sensitive location data tied to places such as health care facilities, places of worship, schools, childcare facilities, political gatherings, and military installations. Mobilewalla is especially relevant to the advertising stack because the FTC said the final order banned the company from collecting consumer data from online real-time bidding ad exchanges for purposes other than participating in those auctions.
The Office of the Director of National Intelligence's 2024 framework for commercially available information likewise treats purchased data as a privacy and civil-liberties issue for intelligence agencies. The public-private line is porous because data can be bought, licensed, compelled, enriched, or routed through vendors before it looks like official action.
The Department of Justice's Data Security Program makes the same point from a national-security direction. Effective April 8, 2025, the program prohibits or restricts certain transactions that give countries of concern, or covered persons tied to them, access to U.S. government-related data or Americans' bulk genomic, geolocation, biometric, health, financial, or other sensitive personal data. The rule is not a general privacy law, but it is an official admission that commercial data markets can become surveillance, counterintelligence, AI-development, and coercion infrastructure.
The Data Supply Chain
The book is most useful today when read as a map of the data supply chain. A modern surveillance system rarely has one owner. It has sources, SDKs, data brokers, cloud providers, analytics products, identity graphs, model developers, app stores, advertisers, government purchasers, security vendors, and institutions that consume scores or summaries. Each participant can claim it is only handling a fragment. The person experiences the combined effect.
This is where Data and Goliath connects to Data Brokers, AI Data Retention, Data Minimization, and Real-Time Bidding. The dragnet is not finished when data is collected. It continues through retention schedules, access permissions, enrichment, embeddings, model training, retrieval indexes, vendor subprocessors, law-enforcement requests, and deletion failures. A record can become less visible to the person while becoming more useful to institutions.
That supply-chain view also clarifies why consent screens are weak. A person may agree to one service interaction without understanding that the data can become a broker product, model feature, fraud signal, risk flag, or agency lead. A meaningful governance regime has to follow the record across systems rather than treating the first click as permission for every later use.
The AI-Age Reading
Read in 2026, Data and Goliath looks less like a finished account of privacy than a prehistory of AI data power. The book describes collection, storage, correlation, profiling, and institutional asymmetry. The current AI stack adds inference, synthesis, memory, automated judgment, natural-language interfaces, and agents that can act through tools.
That addition changes the stakes. Data is no longer valuable only because it tells an institution what happened. It can train systems, personalize persuasion, infer vulnerability, rank workers, summarize cases, generate risk narratives, produce lookalike audiences, enrich dossiers, and steer delegated actions. The old dragnet made people searchable. The AI-era dragnet makes them modelable, scoreable, and operationally available.
Large models also change the interface of surveillance. A person may not experience the system as surveillance at all. They may experience it as helpful memory, a convenient assistant, a fraud-prevention check, a personalized feed, an automated eligibility screen, a hiring platform, a classroom tool, a therapy-like chatbot, or a customer-service workflow. The interface can feel intimate while the underlying arrangement remains extractive. The same problem appears in the site's note on agent action receipts: logs can make delegated machine action accountable, or they can become a new archive of private behavior.
The FTC's September 2024 staff report on major social media and video streaming services makes this link concrete. It found broad data collection, weak minimization and retention practices, inadequate youth safeguards, and widespread use of personal information in algorithms, data analytics, and AI systems. In Schneier's terms, the dragnet did not disappear into AI; it became part of the substrate that automated systems read from.
This is where Schneier's insistence on power is still clarifying. The question is not only whether a model is accurate. It is who gathers the data, who sets the retention rules, who can combine datasets, who can compel access, who can infer sensitive facts, who can contest a generated classification, and who profits from turning life into prediction.
Governance and Safety
The practical answer cannot be only personal privacy hygiene. Encryption, password managers, tracker blocking, and careful settings matter, but they do not govern the platforms, brokers, agencies, model providers, and vendors that structure the data environment. Schneier's argument points toward institutional controls.
For ordinary systems, the baseline is lifecycle discipline: inventory data, map flows, justify collection, minimize fields, limit retention, restrict access, test deletion, review vendors, document law-enforcement access, and prevent sensitive categories from being repurposed through analytics or advertising systems. The FTC's business guidance uses this same sequence in simpler operational language: know what is held, keep only what is needed, protect it, dispose of what is no longer needed, and prepare for incidents.
For AI systems, the controls have to reach derived artifacts. Prompts, uploaded files, chat histories, embeddings, summaries, saved memories, labels, safety traces, tool-call logs, and evaluation datasets can preserve sensitive meaning even when the original record is gone. A governance review that only asks whether a database contains names will miss the model-ready version of the dossier. Derived artifacts need their own retention limits, training-use defaults, deletion tests, access controls, and appeal records because they can re-identify, summarize, or operationalize a person after the source file has disappeared from view.
The hard governance question is purpose separation. Security logging, abuse detection, product analytics, customer support, personalization, model evaluation, and legal hold can all justify some recordkeeping. They cannot all justify the same indefinite archive. A safer design keeps operational receipts narrow, separates audit evidence from product improvement, redacts unrelated content, limits access by role, and proves deletion in both source systems and derived stores.
NIST's Privacy Framework treats privacy risk as a management problem for systems, products, and services, and the NIST AI Risk Management Framework and 2024 Generative AI Profile add AI-specific attention to provenance, third-party data, model documentation, testing, and value-chain risk. In the EU, GDPR Article 5 supplies the familiar principles of purpose limitation, data minimization, storage limitation, integrity, confidentiality, and accountability, while EU AI Act Article 10 requires high-risk AI providers to document data collection origins, preparation operations, suitability, bias examination, and relevant gaps for training, validation, and testing datasets.
The safety implication is straightforward: a dragnet is not merely a privacy risk. It is a security risk, an abuse risk, a discrimination risk, a procurement risk, a child-safety risk, and an institutional capture risk. A breached dataset can enable stalking and fraud. A brokered dataset can give public agencies an end run around legal process. A retained prompt log can expose legal, medical, spiritual, or workplace vulnerability. A model trained or retrieved over ungoverned data can turn old context collapse into fresh automated judgment.
A serious implementation would therefore require data-protection impact assessment for high-risk uses; stricter defaults for minors, health, employment, housing, credit, benefits, immigration, policing, and spiritual or intimate support contexts; procurement clauses for training use, retention, deletion, subprocessors, and incident notice; notice and appeal where data-driven outputs affect rights or opportunities; and independent audit where public or quasi-public systems use commercial data.
The Surveillance Ledger
The concrete control missing from most surveillance systems is a ledger that follows traces across collection, brokerage, inference, retention, and action. A privacy policy says what an institution promises. A surveillance ledger records what actually happens.
- Trace class: location, device identifier, search, purchase, message metadata, biometric signal, workplace telemetry, prompt, file upload, agent log, or derived inference.
- Collection context: the service, sensor, app, vendor, form, legal authority, or public record source that produced the trace.
- Purpose boundary: the use that justified collection and the uses explicitly barred, especially advertising, scoring, training, retrieval, pricing, public-sector procurement, or eligibility decisions.
- Retention and deletion: live period, audit period, backup expiry, legal-hold exceptions, deletion test, and whether embeddings, summaries, labels, memory, and downstream exports are covered.
- Sharing path: processors, brokers, advertisers, clean rooms, law-enforcement access, government purchasers, model providers, subprocessors, and cross-border recipients.
- Contestability: notice, access, correction, opt-out, erasure, appeal, human review, and downstream correction when the source record changes.
- Security posture: encryption, access review, least privilege, vendor audit rights, breach notice, incident record, and restrictions on sensitive categories.
That ledger connects this review to Data Minimization, Data Brokers, AI Data Retention, AI Data Provenance, AI System Inventory, AI Audit Trails, Vendor and Platform Governance, and Agent Audit and Incident Review. Without this kind of record, the dragnet remains administratively useful precisely because it is hard to inspect.
Where the Book Needs Updating
Data and Goliath is a 2015 book, and it shows. It predates the mainstream explosion of large language models, the current AI-agent boom, deepfake politics, modern data-center scale, app-store identity consolidation, the 2020s enforcement wave around sensitive location data, and the current generation of AI governance frameworks. American University's 2022 review already noted that the book had become dated in some technical respects and that its policy frame is strongly U.S.-focused.
Those limits matter. Surveillance governance now has to account for biometric systems, cross-border cloud dependencies, synthetic media, model training, public-sector AI procurement, recommender systems, workplace analytics, school platforms, companion systems, and the ability of models to infer sensitive traits from fragments. A privacy politics built only around 2015-era collection will miss how generated outputs and automated actions complete the loop.
The book also spends less time than today's reader may want on racialized surveillance, labor surveillance, border systems, and data extraction in the global supply chain. Those gaps are better filled by books and pages such as Dark Matters, Automating Inequality, Atlas of AI, and Privacy and Data Stewardship.
Even so, the book's age is part of its value. It prevents the AI debate from pretending that extraction began with generative models. AI companies inherited a web already trained to collect, log, monetize, retain, trade, and normalize personal data. The new systems are powerful partly because the social permission structure was built earlier.
What This Changes
The recurring pattern across surveillance, platform governance, automated welfare, algorithmic scoring, workplace dashboards, and AI companions is the same: a system first makes people legible, then uses that legibility to shape the choices available to them. The harm is not always dramatic. Often it arrives as convenience, personalization, risk management, security, optimization, or care.
Data and Goliath is useful because it refuses the individualization of the problem. Better passwords, encrypted apps, browser settings, and privacy hygiene are worth having, but they cannot substitute for public rules about collection, retention, secondary use, compelled access, auditing, liability, and institutional accountability. Personal discipline is not enough when the business model and the state both benefit from the record.
The book also clarifies why AI governance needs minimization and provenance at its center. A model-mediated society cannot treat every trace as legitimate raw material. Some data should not be collected. Some should not be retained. Some should not be reused for training. Some should not be fused with other datasets. Some should remain local, ephemeral, encrypted, aggregated, differentially private, or unavailable to institutions that would turn it into leverage.
The practical lesson is blunt: privacy is not nostalgia for a pre-digital self. It is a condition for agency, dissent, experimentation, repair, and unscored life. Without it, intelligence becomes easier to build, but human freedom becomes easier to administer.
Related Pages
- The Digital Person and the Dossier Machine
- Liquid Surveillance and the Data Flow of Everyday Life
- The Age of Surveillance Capitalism and the Prediction Market for Human Futures
- Data Cartels and the Information Monopoly Behind AI
- Automating Inequality and the Digital Poorhouse
- Atlas of AI and the Extraction Stack
- Subprime Attention Crisis and the Market That Measures Belief
- The Quantified Worker and Surveillance Labor
- The Agent Log Becomes the Receipt
- Data Minimization
- Data Brokers
- AI Data Retention
- Differential Privacy
- Real-Time Bidding
- AI Governance and AI Audits and Assurance
- Privacy and Data Stewardship
Source Discipline
The sources below do different jobs. Schneier's book page, Norton's listing, and contemporary reviews establish the book's publication context and reception. FTC reports and orders document current U.S. regulatory attention to data brokers, platform surveillance, retention, sensitive location data, real-time-bidding data misuse, and surveillance pricing. ODNI and DOJ sources document national-security treatment of commercially available and bulk sensitive personal data. NIST materials are voluntary risk-management frameworks, not statutes. GDPR Article 5 and EU AI Act Article 10 are legal anchors, but they apply by jurisdiction and system category.
The AI-era reading in this review is therefore an argued extension, not a claim that Schneier predicted every current system. The narrow claim is that the dragnet pattern he described now supplies data, incentives, and institutional habits for model training, retrieval, scoring, personalization, and automated judgment.
This page makes no claim that any AI system is conscious, divine, or AGI. The claim is institutional: data collection, retention, brokerage, and automated inference can create power without requiring a machine mind.
Sources
- Bruce Schneier, Data and Goliath, official book page, description, bestseller note, ISBNs, page count, excerpts, related interviews, and review index, reviewed June 25, 2026.
- W. W. Norton & Company, Data and Goliath by Bruce Schneier, publisher listing for ISBN 9780393352177, reviewed June 25, 2026.
- Jonathan A. Knee, "Looking at the Promise and Perils of the Emerging Big Data Sector", The New York Times DealBook, March 16, 2015, archived on Schneier on Security, reviewed June 25, 2026.
- Paul Bernal, review of Data and Goliath, Times Higher Education, May 21, 2015, archived on Schneier on Security, reviewed June 25, 2026.
- Ana Izabella Collares Williams, "Book Review - Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World", American University School of International Service, June 23, 2022, reviewed June 25, 2026.
- Federal Trade Commission, Data Brokers: A Call for Transparency and Accountability, May 2014, reviewed June 25, 2026.
- Federal Trade Commission, A Look Behind the Screens: Examining the Data Practices of Social Media and Video Streaming Services, staff report, September 2024, reviewed June 25, 2026.
- Federal Trade Commission, FTC surveillance pricing study press release, January 17, 2025, reviewed June 25, 2026.
- Federal Trade Commission, FTC final order with X-Mode and successor Outlogic, April 12, 2024, reviewed June 25, 2026.
- Federal Trade Commission, FTC action against Gravy Analytics and Venntel, December 3, 2024, reviewed June 25, 2026.
- Federal Trade Commission, FTC final order banning Mobilewalla from selling sensitive location data, January 14, 2025, reviewed June 25, 2026.
- Office of the Director of National Intelligence, Intelligence Community Policy Framework for Commercially Available Information, approved May 2024, and ODNI, release announcing the framework, reviewed June 25, 2026.
- U.S. Department of Justice, National Security Division, Data Security Program, bulk sensitive personal data and government-related data transaction restrictions under Executive Order 14117, reviewed June 25, 2026.
- Federal Trade Commission, Data Security business guidance and Protecting Personal Information: A Guide for Business, data-security guidance, reviewed June 25, 2026.
- NIST, Privacy Framework and Privacy Framework FAQ, voluntary privacy-risk management framework and core functions, reviewed June 25, 2026.
- NIST, AI Risk Management Framework, AI RMF Core, and Generative AI Profile, NIST AI 600-1, AI risk-management guidance and July 2024 generative AI profile, reviewed June 25, 2026.
- European Union, General Data Protection Regulation, Regulation (EU) 2016/679, Article 5 principles and Article 17 erasure rights, reviewed June 25, 2026.
- European Commission AI Act Service Desk, Article 10: Data and data governance, high-risk AI data requirements, reviewed June 25, 2026.
Book links are paid affiliate links. As an Amazon Associate I earn from qualifying purchases.
- Amazon, Data and Goliath by Bruce Schneier, retail listing, reviewed June 25, 2026.