Data Brokers
Data brokers are intermediaries that collect, infer, combine, license, sell, or otherwise provide personal, household, device, location, and behavioral data outside the context where it was originally produced. In AI systems, brokered data becomes training material, retrieval context, identity infrastructure, targeting fuel, and a governance test for consent, provenance, deletion, and institutional power.
Definition
A data broker is a role in a data supply chain: an intermediary that collects, receives, infers, enriches, matches, licenses, sells, or otherwise makes personal or household-level data available outside the relationship or first-party purpose in which the data was first generated. A broker may be a dedicated data company or a product line inside an advertising network, analytics vendor, identity service, people-search site, location-intelligence provider, lead generator, fraud tool, or AI data supplier.
The key feature is separation from the original context. A person may interact with an app, retailer, public office, website, loyalty program, car, phone, or public record system without knowing that the resulting data will be combined, inferred from, resold, or used in a different institutional setting.
Not every analytics vendor is a data broker in every jurisdiction, and the same company can have a direct relationship with people in one product while acting as a broker in another. Legal definitions vary. California focuses on businesses that knowingly collect and sell personal information about consumers with whom they do not have a direct relationship. Oregon defines a broker around collecting and selling or licensing brokered personal data. Texas combines registration, notice, security, and transfer-description duties. Federal regimes may instead use categories such as consumer reports, brokered personal data, sensitive data, bulk U.S. sensitive personal data, personally identifiable sensitive data, or commercially available information.
The practical governance boundary is use, not label. A company may avoid the phrase "data broker" while still brokering data through lead generation, identity resolution, enrichment, clean-room matching, location intelligence, people search, fraud scoring, or "commercially available information" procurement.
Boundary Tests
Four questions keep the term disciplined. First, did the person have a direct relationship with the buyer, or is the buyer receiving data from an outside intermediary? Second, has the data moved into a new purpose such as advertising, scoring, pricing, investigation, identity resolution, training, or retrieval? Third, has the data been transformed into an inference, segment, score, embedding, match key, or household link that still affects a person? Fourth, can the affected person see, correct, delete, or contest the profile and its downstream use?
These tests separate brokerage from ordinary service provision without letting labels do the work. A processor acting only under a customer's instructions may not be brokering data in a legal sense. A people-search site, location-intelligence vendor, lead generator, identity-resolution service, or enrichment API may be brokering data even when it describes the product as analytics, fraud prevention, audience quality, or public-record access.
The high-stakes boundary is consequential use. Brokered data used for eligibility, pricing, housing, employment, credit, insurance, benefits, policing, immigration, health-adjacent targeting, or personalized persuasion needs stronger evidence than brokered data used for ordinary address correction. The same source file can be low-risk in one workflow and unsafe in another.
Snapshot
- Core function: collect, combine, infer, match, license, or sell personal, household, device, location, and behavioral data across contexts.
- Common buyers and uses: advertising, identity resolution, fraud detection, people search, lead generation, public-sector procurement, political targeting, AI enrichment, training, retrieval, and scoring.
- High-risk categories: precise geolocation, health and reproductive-health data, financial data, biometric data, children's data, credentials, government identifiers, military status, and government-related or national-security-sensitive data.
- Governance minimum: source inventory, field-level sensitivity review, legal basis, purpose limits, deletion path, retention period, downstream recipient log, contract restrictions, and evidence that opt-outs propagate.
- Governance test: can the buyer trace the data from collection through inference, resale, model use, deletion, and downstream transfer?
- Boundary caution: a registry entry, contract, clean-room match, de-identification claim, or "commercially available" label does not prove consent, accuracy, lawful use, or suitability for an AI decision system.
Data Flows and Products
Common inputs include public records, property records, court records, voter files where legally available, marketing lists, purchase histories, loyalty programs, warranty registrations, app software development kits, mobile advertising IDs, precise or inferred location, web browsing signals, search and social signals, vehicle data, identity graphs, demographic estimates, and other data purchased from partner exchanges or downstream brokers.
Common outputs include people-search profiles, household dossiers, address and phone append services, identity verification, fraud and risk products, ad segments, location-intelligence products, foot-traffic analytics, lead lists, political audiences, propensity scores, lookalike audiences, and enrichment fields that attach income, interests, health proxies, family status, military status, housing signals, or vulnerability indicators to a record.
Brokerage increasingly includes derived artifacts rather than only files of records. Identity graphs, household links, risk tiers, geofence audiences, embeddings, clean-room matches, suppression lists, and "do not contact" files can all carry personal-data consequences even when a buyer sees only a token, segment, or score.
The data product may not look personal at the point of use. A buyer may see only a score, segment, match key, clean-room match, model feature, embedding, dashboard, or API response. Governance still has to follow the provenance. If the product was built from location, health, financial, biometric, child, credential, or government-ID data, the sensitivity does not disappear because the final screen shows an inference.
Some products cross legal categories as they move. A marketing segment can become an eligibility factor if used for credit, housing, employment, insurance, tenant screening, or government-benefit triage. A location or purchase-history feed can become health-adjacent when it implies clinic visits, pregnancy, addiction treatment, disability, worship, union activity, or household vulnerability. The buyer's use can matter as much as the broker's sales label.
Brokerage Ledger
A brokered-data review should produce an evidence ledger, not only a vendor-risk memo. For each dataset, segment, score, identity graph, feature feed, or API response, the ledger should record the original source category, collection context, broker role, data fields, sensitivity level, jurisdiction, legal basis or consent claim, permitted uses, excluded uses, retention period, deletion path, opt-out propagation, downstream recipients, audit rights, and whether the data may be used for training, retrieval, matching, scoring, or human eligibility decisions.
The ledger should separate source evidence from permission evidence. "Public record," "commercially available," "registered broker," or "contractually permitted" does not answer whether the person had notice, whether sensitive inferences are reliable, whether opt-out applies, or whether an AI use is compatible with the original collection context.
The ledger should follow derivatives. If a brokered file becomes embeddings, a vector index, a customer-enrichment feature, a synthetic dataset, a suppression list, a fraud model, or an agent memory, the inventory should connect those artifacts back to the original brokered source. Otherwise an organization can delete the visible purchase order while keeping the operational profile.
For AI procurement, the minimum test is whether a buyer can reject or quarantine brokered data before it reaches a model, feature store, retrieval system, or decision workflow. That test connects data brokerage to AI Data Provenance, AI Data Retention, AI Audit Trails, AI Procurement, and AI System Inventory.
Current Context
As of June 25, 2026, data-broker governance in the United States is a patchwork of enforcement orders, state registration and deletion systems, national-security restrictions, and consumer-reporting debates. The Federal Trade Commission's 2014 report described a market with limited consumer visibility and recommended transparency and control. Since 2024, FTC location-data cases have turned that general concern into concrete orders against firms including X-Mode/Outlogic, InMarket, Mobilewalla, and Gravy Analytics/Venntel, with a proposed order announced in May 2026 to resolve the FTC's Kochava litigation if approved by a federal judge. These orders are named-party remedies, not a general federal data-broker statute.
Pricing is now an explicit data-broker-adjacent concern. FTC staff's January 2025 surveillance-pricing materials described intermediary firms that can use granular consumer data, including location, demographics, browsing patterns, shopping history, mouse movements, and cart behavior, to help tailor prices, discounts, search results, or offers. That staff work is not a final rule and does not prove that every retailer uses individualized pricing, but it shows why brokered and inferred data should be reviewed for offer design as well as advertising.
California has moved beyond ordinary broker registration toward one-to-many deletion. The California Privacy Protection Agency says California residents may use DROP, the Delete Request and Opt-out Platform, to submit one deletion request to active registered data brokers as of January 1, 2026. The public DROP site says data brokers must begin processing requests on August 1, 2026, process deletion lists at least every 45 days, and may take up to three months to process updated requests. CPPA's DROP materials also say a match should delete associated personal information, including sensitive and inferred information, while excluding first-party, exempt, and publicly available data. That makes derivative tracking part of practical compliance.
Other state regimes add registration and security duties. Texas requires covered data brokers to register with the Secretary of State, post a conspicuous website or app notice, describe categories of data processed and transferred, maintain a comprehensive information security program, and take measures for service-provider safeguards; the Texas Attorney General also says the Secretary of State is the filing officer and does not investigate alleged violations. Oregon requires data brokers to register before collecting, selling, or licensing brokered personal data in Oregon, and defines licensing as granting access to or distributing data for consideration. These registries improve visibility but do not by themselves prove that every downstream use is fair, accurate, consented, or safe.
Federal policy is also treating brokered data as a security issue. DOJ's Data Security Program under Executive Order 14117 became effective on April 8, 2025, with some affirmative due-diligence, audit, annual-report, and rejected-transaction reporting requirements taking effect in October 2025. It addresses covered transactions involving bulk U.S. sensitive personal data and U.S. government-related data with countries of concern or covered persons. PADFAA is a separate statute enforced by the FTC; it prohibits a data broker from selling, licensing, renting, trading, transferring, releasing, disclosing, providing access to, or otherwise making personally identifiable sensitive data of a U.S. individual available to a foreign adversary country or a controlled entity. Neither regime is a general consumer privacy statute, but both show that brokered data is now treated as national-security infrastructure as well as advertising supply.
The CFPB proposed a Regulation V rule in December 2024 that would have applied Fair Credit Reporting Act protections to certain sensitive consumer information sold by data brokers, but the Bureau withdrew the proposal on May 15, 2025 and said it would take no further action on that proposed rule. The withdrawal does not make brokered data harmless; it means that federal FCRA expansion is not a live rulemaking as of this review. Where brokered data is used for credit, employment, insurance, housing, tenant screening, or other eligibility decisions, reviewers still need to analyze existing FCRA, state privacy, consumer-protection, civil-rights, and sectoral duties rather than cite the withdrawn proposal as operative law.
AI Relevance
AI increases the value and risk of brokered data because models can join weak signals, infer traits, predict vulnerability, personalize prices, rank leads, generate persuasive content, enrich customer records, and automate decisions across contexts users never see.
Brokered data can enter AI systems in several ways. It can be training data, retrieval data, evaluation data, enrichment data for customer records, identity-resolution data, advertising audience data, fraud features, or context fed into an agent before it acts. The same dataset can move from marketing to credit, insurance, employment, public benefits, policing, or immigration through a vendor or customer integration.
Brokered access can matter even without a bulk file transfer. An API lookup, people-search portal, clean-room match, enrichment call, identity graph, or model context connector can expose the system to brokered data at runtime. AI governance therefore has to inventory data services, tool calls, and data-residency paths, not only static datasets.
Brokered data also expands the attack surface for agents. An agent with CRM, ad, identity, people-search, procurement, or case-management access may silently import brokered profiles into emails, sales workflows, screening, investigations, or recommendations unless tool permissions, logs, and purpose limits are explicit.
For AI governance, the problem is not only whether a dataset contains names. Brokered identifiers, device graphs, location trails, household links, purchase histories, and demographic inferences can make model outputs more targeted and more invasive. They can also create bias, false matches, feedback loops, and hard-to-contest decisions when the affected person cannot see the profile or correct its source.
De-identification, pseudonymization, aggregation, and synthetic-data claims need evidence rather than trust. A brokered segment, embedding, vector index, model feature, or synthetic sample can preserve sensitive meaning or become linkable when joined with identity graphs, location trails, purchase histories, or public records.
Governance and Safety
Any AI system that uses brokered data should document the source, acquisition path, legal basis, permitted uses, data categories, sensitive fields, opt-out and deletion obligations, retention period, downstream recipients, and whether the broker can audit or revoke use. A contract is not enough if the developer cannot prove what data entered which model, index, feature store, or decision system.
Procurement review should ask the broker or data vendor for source categories, original collection context, consent or lawful-basis claims, known restrictions, training and retrieval permissions, sensitive-category exclusions, deletion propagation, downstream transfer limits, security controls, and audit rights. The buyer should be able to reject a dataset because it is too opaque or too sensitive for the intended use, even if it is marketed as compliant.
Intake should include a source quarantine step before data enters a model, vector database, feature store, identity graph, or agent tool. Quarantine means the buyer can inspect fields, remove prohibited categories, check consent and opt-out claims, test match accuracy, assign retention rules, and block training or retrieval use until the legal and safety basis is recorded.
Red flags include missing source categories, unrestricted resale rights, no opt-out propagation, no deletion proof for derivatives, broad model-training rights by default, sensitive geofence products, household vulnerability segments, weak de-identification evidence, and contracts that prevent audit or disclosure to affected people. A brokered-data intake should be allowed to return "do not use" rather than only "approved with conditions."
Restricted-use policy should be explicit. Precise location, reproductive-health signals, children's data, biometric data, credentials, military or government identifiers, and household vulnerability segments should be barred from general-purpose training, broad retrieval, open-ended personalization, and agent memory unless a narrow legal basis, necessity showing, retention limit, and contestability path exist.
Deletion and opt-out controls should cover derivatives. Suppression lists, match keys, embeddings, vector indexes, audience segments, lead scores, synthetic examples, and model features should be tied back to the brokered source so a deletion request, contract termination, or source-quality failure can propagate beyond the original file.
High-risk uses need stronger controls: housing, employment, credit, insurance, health, education, public benefits, law enforcement, immigration, political persuasion, child-directed systems, reproductive-health contexts, and intimate or companion products. In these domains, brokered data should trigger impact assessment, accuracy testing, human review, notice and appeal, and data minimization by default.
A buyer should also test for substitution risk: if the organization would need consent, legal process, a regulated consumer report, or a formal impact assessment to collect the information directly, purchasing the same signal from a broker should not erase that obligation.
Government use requires special caution. Commercially available information can let public agencies obtain profiles, location data, social signals, or identity-resolution tools through procurement rather than ordinary legal process. ODNI's 2024 framework defines commercially available information and acknowledges the need for baseline rules for intelligence-community access, collection, and processing. The governance question is whether a purchase should be allowed to bypass the legal friction that would apply if the same information were demanded directly.
Safety review should also cover data poisoning and security. A brokered dataset can contain stale, false, malicious, mislabeled, or unlawfully collected data. If that data feeds a model, matcher, risk engine, or agent, errors can scale into automated harm. If it contains credentials, government identifiers, device IDs, or detailed movement trails, breach consequences can include fraud, stalking, blackmail, or operational exposure for officials, service members, journalists, activists, and vulnerable communities.
Source Discipline
Claims about data brokers should distinguish statutes, proposed rules, final agency orders, complaints, guidance, registry entries, company marketing, and civil-society research. A complaint states allegations. A proposed order is not the same as a final order. A registry entry shows self-reported or filed information, not regulator approval of a business model.
Name the data category. Personal data can mean a name and address, a persistent device identifier, precise geolocation, biometric information, login credentials, health information, financial data, a household linkage, a reproductive-health inference, or a model-ready audience segment. Governance duties and risks change with the category.
Name the transaction. Selling, licensing, sharing, appending, scoring, matching, API access, clean-room matching, training, retrieval, and public-agency procurement are different flows. Do not collapse them into a generic data sale if the factual claim depends on the mechanism.
For current law, preserve dates, jurisdiction, and procedural status. California DROP timing, FTC orders, DOJ data-security rules, PADFAA duties, and the withdrawn CFPB FCRA proposal are not interchangeable. A source may show a rule is proposed, withdrawn, live, effective, stayed, finalized, or only announced. Do not infer legal compliance from a dataset being called public, commercially available, anonymized, or registered.
For pricing, persuasion, and AI claims, state the evidentiary limit. A surveillance-pricing inquiry or staff perspective shows regulator concern and observed intermediary capabilities; it is not proof that a specific seller changed a specific person's price. A data-broker marketing page may show offered capability; it does not prove deployment, accuracy, lawfulness, or consumer impact.
Risk Pattern
Context collapse. Data given off in one setting becomes input to decisions in another.
Invisible dossiers. People may not know which broker holds a profile, what it says, who bought it, or how to correct it.
Inference laundering. Sensitive facts can reappear as segments, scores, likelihoods, or proxy variables.
Decision laundering. A buyer can blame a vendor score while the broker blames source data and no one explains the decision.
Government bypass. Public agencies can be tempted to buy what they might otherwise need a warrant, subpoena, statute, or public process to obtain.
Foreign-access risk. Bulk brokered data can expose service members, officials, dissidents, journalists, activists, and critical-infrastructure workers to coercion or targeting.
Safety mismatch. Data collected for advertising may be too inaccurate or biased for housing, employment, benefits, insurance, policing, or health-adjacent uses.
Derivative survival. The raw file may be deleted while features, embeddings, audiences, suppression lists, model weights, and customer exports continue to carry the profile.
Agentic misuse. Brokered profiles can be pulled into emails, recommendations, workflows, or investigations by AI agents without a person noticing the source or contestability problem.
Deletion failure. Brokered data often has copies, derivatives, suppression lists, model features, embeddings, and downstream customer exports that make deletion difficult to verify.
Spiralist Reading
For Spiralism, data brokerage is invisible biography.
It lets institutions act on a shadow version of a person without the person knowing what story has been assembled, who assembled it, who bought it, or what doors it quietly opened or closed. The person lives one life. The broker sells another.
The AI-era danger is that the shadow biography becomes active. It does not merely sit in a database. It becomes prompt context, model feature, agent memory, risk score, price signal, ad target, search result, government lead, or institutional suspicion. The Mirror does not need to know the soul. It only needs enough fragments to make the person administratively legible.
Open Questions
- What evidence should prove that deletion, opt-out, or suppression requests reached derived scores, embeddings, identity graphs, and downstream customers?
- Which categories of brokered data should be categorically barred from AI training, retrieval, personalization, or scoring?
- When should public-agency purchase of commercially available data require the same legal process or public justification that direct collection would require?
- How should law and procurement distinguish marketing enrichment from consumer reporting, eligibility screening, or other consequential decision support?
- How should foreign-access restrictions, state deletion systems, and domestic privacy rights interact when the same brokered dataset crosses all three regimes?
Related Pages
Privacy and Data Markets
- Surveillance Capitalism
- Real-Time Bidding
- Contextual Integrity
- Data Minimization
- Digital Identity
- Data Trusts
- Differential Privacy
- Biometric Categorization
AI Systems
- Training Data
- AI Data Licensing
- AI Data Provenance
- AI Data Retention
- AI Audit Trails
- AI System Inventory
- AI Procurement
- AI Memory and Personalization
- Retrieval-Augmented Generation
- Vector Databases
- Model Context Protocol
- AI Agent Observability
- Prompt Injection
- Data Poisoning
- Agentic Commerce
- AI Persuasion
- Algorithmic Bias
- Opaque Scoring Systems
- Synthetic Identity Fraud
Governance
- Privacy and Data
- Transparency and Public Registers
- Vendor and Platform Governance
- AI in Government and Public Services
- AI in Finance
- AI in Employment
- AI in Healthcare
- Algorithmic Impact Assessments
- Age Assurance
- Right to Explanation
- Notice and Appeal
- AI Data Residency
- Deceptive Design Patterns
- Lina Khan
Related Essays
- The Location Broker Becomes the Shadow Sensor Network
- The Price Becomes Personalized Prediction
- Data Cartels and the Information Monopoly Behind AI
Sources
- Federal Trade Commission, Data Brokers: A Call for Transparency and Accountability, May 2014.
- Federal Trade Commission, FTC Finalizes Order with X-Mode and Successor Outlogic Prohibiting it from Sharing or Selling Sensitive Location Data, April 12, 2024.
- Federal Trade Commission, FTC Finalizes Order with InMarket Prohibiting It from Selling or Sharing Precise Location Data, May 1, 2024.
- Federal Trade Commission, FTC Finalizes Order Banning Mobilewalla from Selling Sensitive Location Data, January 14, 2025.
- Federal Trade Commission, FTC Finalizes Order Prohibiting Gravy Analytics, Venntel from Selling Sensitive Location Data, January 14, 2025.
- Federal Trade Commission, FTC to Ban Kochava and Subsidiary from Selling Sensitive Location Data to Settle Charges, May 4, 2026.
- Federal Trade Commission, FTC Surveillance Pricing Study Indicates Wide Range of Personal Data Used to Set Individualized Consumer Prices, January 17, 2025.
- Federal Trade Commission, Surveillance Pricing Update and the Work Ahead, January 17, 2025.
- California Privacy Protection Agency, Information for Data Brokers, reviewed June 25, 2026.
- California Privacy Protection Agency, California Data Broker Registry, reviewed June 25, 2026.
- California Privacy Protection Agency, Delete Request and Opt-out Platform (DROP), reviewed June 25, 2026.
- California Privacy Protection Agency, How DROP works, reviewed June 25, 2026.
- California Privacy Protection Agency, Accessible Deletion Mechanism: Delete Request and Opt-out Platform System Requirements, effective January 1, 2026.
- California Privacy Protection Agency, California Approves Delete Act Regulations, November 13, 2025.
- Texas Secretary of State, Data Brokers, reviewed June 25, 2026.
- Texas Attorney General, Texas Data Broker Act, reviewed June 25, 2026.
- Oregon Division of Financial Regulation, Data Broker Registry, reviewed June 25, 2026.
- U.S. Department of Justice National Security Division, Data Security Program, reviewed June 25, 2026.
- U.S. Department of Justice, Justice Department Implements Critical National Security Program to Protect Americans' Sensitive Data from Foreign Adversaries, April 11, 2025.
- Federal Trade Commission, Protecting Americans' Data from Foreign Adversaries Act of 2024 (PADFAA), reviewed June 25, 2026.
- Federal Trade Commission, FTC Reminds Data Brokers of Their Obligations to Comply with PADFAA, February 9, 2026.
- Consumer Financial Protection Bureau, Protecting Americans from Harmful Data Broker Practices (Regulation V), proposed rule page, reviewed June 25, 2026.
- GovInfo, Protecting Americans From Harmful Data Broker Practices (Regulation V); Withdrawal of Proposed Rule, May 15, 2025.
- Office of the Director of National Intelligence, Intelligence Community Policy Framework for Commercially Available Information, May 2024.