Blog · Review Essay · Last reviewed June 23, 2026

The Digital Person and the Dossier Machine

Daniel J. Solove's The Digital Person is a privacy book that has aged into an AI governance book. Its core warning is that databases do not merely reveal people. They assemble administrative versions of people, circulate those versions through institutions, and make decisions around records the subject cannot fully see, correct, or contest.

For this review, a digital dossier means a linked or linkable administrative profile assembled from identifiers, public records, transactions, device and location traces, inferred traits, brokered data, and institutional records. The danger is not only exposure. It is that the profile becomes a working substitute for the person in decisions they cannot inspect or challenge.

The Book

The Digital Person: Technology and Privacy in the Information Age was published by NYU Press in December 2004, with a 283-page hardcover edition and a paperback edition following in 2006. NYU Press lists it as the first volume in the Ex Machina: Law, Technology, and Society series. Solove was then an associate professor at George Washington University Law School, and the GW Law repository makes the full text available as a faculty publication.

The book studies the social, political, and legal consequences of personal information held in computer databases. Its central term is the "digital dossier": the assembled record of transactions, identifiers, public records, browsing traces, background checks, credit information, and other fragments that businesses and government agencies use to make decisions about people.

That makes the book an important companion to The Black Box Society, Data and Goliath, Automating Inequality, Delete, and The Ordinal Society. Solove is earlier than most of them, and less focused on machine learning, but his account of database personhood explains the administrative substrate those later systems inherit.

Current Context

As of June 23, 2026, Solove's dossier problem is not a period concern from the early web. It is a live infrastructure problem. The Federal Trade Commission's 2014 data-broker report described a market in which consumers often could not know which brokers held data about them, how the data was obtained, or how it was being used. That visibility problem now sits underneath mobile advertising, identity resolution, risk scoring, fraud prevention, public-sector procurement, and AI personalization.

Recent U.S. enforcement and regulation make the point concrete without pretending there is a single national privacy settlement. FTC location-data cases against data brokers and advertising intermediaries have focused on sensitive geolocation, consent, retention, and downstream disclosure. On May 4, 2026, the FTC announced a proposed stipulated order with Kochava and a subsidiary over sensitive location data; the agency noted that such orders have force of law when approved and signed by a federal district judge. These are case-specific controls, not a general replacement for privacy law.

California's Delete Act moved the issue from disclosure to operational deletion. The California Privacy Protection Agency's DROP system launched January 1, 2026, for California residents to submit one deletion request to active registered data brokers. CPPA materials state that data brokers must begin processing those requests on August 1, 2026 and delete matching data within 90 days. That is exactly the kind of dossier-level remedy Solove's frame implies: the governance object is the profile and its downstream copies, not merely one disclosed fact.

The national-security frame has also changed. The Department of Justice Data Security Program, effective April 8, 2025 under Executive Order 14117, treats access to Americans' bulk sensitive personal data and U.S. government-related data by countries of concern as a security problem. That does not make every privacy issue a national-security issue. It does show that commercially available dossiers can become strategic infrastructure once they include genomic, geolocation, biometric, health, financial, or government-related data.

The current federal picture remains uneven. The CFPB issued a December 2024 proposed Regulation V rule that would have applied Fair Credit Reporting Act concepts to sensitive consumer information sold by data brokers, then withdrew the proposal effective May 15, 2025. That sequence matters for source discipline: the proposal is evidence of regulatory concern, but it is not an operative rule.

State automated-decision rules are beginning to connect privacy rights to system rights. California's privacy regulator says its CCPA regulations on risk assessments, cybersecurity audits, automated decisionmaking technology, and related updates became effective January 1, 2026. Its September 2025 announcement says covered risk-assessment duties begin January 1, 2026, while businesses using automated decisionmaking technology for significant decisions must comply with ADMT requirements beginning January 1, 2027. The lesson for Solove's book is direct: once a dossier feeds a consequential automated decision, the remedy has to include evidence of the decision path, not only a notice that data was collected.

The Dossier as Person

The strongest idea in The Digital Person is that privacy harm is not limited to exposure. The old mental picture says privacy is invaded when a hidden room is entered or a secret is published. Solove argues that digital databases create a different kind of vulnerability. Much of the information may be mundane, semi-public, voluntarily submitted, or collected through ordinary participation. The harm comes from aggregation, circulation, interpretation, and use.

A dossier is not a mirror. It is a decision interface. It gives lenders, employers, marketers, police, platforms, insurers, schools, landlords, benefits agencies, and fraud teams a version of the person they can sort, search, flag, price, exclude, target, approve, or escalate. Once that model becomes operational, the person has to live with the consequences of a record they did not design.

A sharper definition is that a dossier is a claim stack. Each field or inference says something about a person, carries a source, has a collection context, travels to recipients, and expires or fails to expire under some rule. AI systems make that stack executable: retrieval systems find it, scoring systems weight it, agents act on it, and generated explanations can make a weak record sound administratively settled.

This is why the book remains useful after the first wave of internet privacy debates. Solove is not only worried that someone knows too much. He is worried that records become decision infrastructure. The database does not have to be malicious to be dangerous. It can be incomplete, stale, decontextualized, inferred from weak signals, merged through data brokers, retained beyond its purpose, or treated as more authoritative than the person standing in front of the institution.

That is a legibility problem. A society that runs through databases must translate people into fields, identifiers, categories, and risk signals. The translation can make administration easier, but it also creates a second self that is easier to govern than the original. The record becomes portable. The person becomes locally absent from decisions made in their name.

The modern version includes identity graphs, device identifiers, hashed emails, loyalty records, location trails, inferred households, advertising segments, risk scores, embeddings, vector indexes, and agent memory. Each artifact can look small in isolation. Together they form the profile that institutional systems read as the person.

The Kafka Problem

Solove's most durable move is his shift away from Orwell as the master metaphor for privacy harm. Orwell helps describe centralized watching, fear, and political domination. But many database harms are more bureaucratic than theatrical. They are closer to Kafka: opaque procedure, inaccessible files, uncertain accusation, and a subject who cannot locate the point where the system can be answered.

This distinction matters because bad metaphors produce bad governance. If privacy is imagined only as secrecy, then any fact that is not fully secret may appear fair game. If privacy is imagined only as control, then a long notice, a consent screen, or an opt-out maze may look like a solution. Solove's book asks for a broader account: privacy as protection against institutional power that collects, processes, shares, and acts on personal information without meaningful participation from the people described.

The Kafka frame also explains why database power often feels banal. There may be no single villain, no dramatic disclosure, no room full of monitors. There are vendors, forms, data brokers, public records, background-screening firms, access controls, matching rules, security exceptions, credit files, and bureaucratic habits. Harm arrives as delay, denial, suspicion, misclassification, exposure to fraud, inability to correct a file, or a decision that seems to come from nowhere.

That is the world many AI systems now enter. Models do not replace the dossier machine. They plug into it. They summarize records, infer missing traits, score risk, personalize offers, route applicants, flag behavior, and generate explanations around data trails that were already unevenly visible and hard to contest. The resulting governance question is not only "Was the model accurate?" It is "What record did the model inherit, who can see it, and what path exists for correction or appeal?" See also notice and appeal, right to explanation, and opaque scoring systems.

The AI-Age Reading

Read in 2026, The Digital Person looks like a prehistory of automated personhood. Solove wrote before smartphones, social graphs, real-time bidding, large language models, and current AI agents became ordinary infrastructure. Yet the book identifies the condition that makes all of them politically serious: institutions increasingly act on data doubles.

AI intensifies this by making dossiers active. A database stores and retrieves. A model infers, ranks, summarizes, predicts, drafts, recommends, and sometimes acts. The old dossier said, "Here is what the record contains." The AI-era dossier says, "Here is what this pattern probably means, what should happen next, and how the decision can be justified in language."

This changes the stakes of privacy. A person's data trail can become training material, retrieval context, personalization memory, risk signal, customer-service context, fraud score, hiring feature, ad target, educational profile, law-enforcement lead, or chatbot prompt history. The issue is not just whether an individual fact was public or private. The issue is whether the institutional model built from those facts can be inspected, corrected, limited, forgotten, or refused.

Solove's later article "A Taxonomy of Privacy" helps make this explicit by separating privacy problems into information collection, processing, dissemination, and invasion. That structure is useful for AI because harms often move across stages. A system may collect innocuous traces, process them into sensitive inferences, disseminate outputs through vendors or agencies, and then turn the result into an intervention in housing, work, credit, policing, education, or care.

The book also helps avoid a common mistake in AI debates: treating the model as the entire problem. The model is important, but the surrounding dossier machine supplies the raw material, institutional authority, and operational route. A good model attached to an unjust record system can still produce unjust governance. A transparent model attached to unappealable records can still leave people powerless. That connects privacy directly to AI data provenance, AI data retention, AI audit trails, and digital identity.

Governance and Safety

The first practical control is a dossier ledger. An institution that uses personal data in consequential decisions should be able to name the source fields, vendors, derived features, inference steps, model or rules version, recipients, retention periods, correction path, appeal path, and deletion obligations. Without that map, privacy promises collapse into slogans because no one can tell where the institutional person was assembled.

The ledger should be decision-facing, not just an internal IT inventory. If a tenant, applicant, patient, student, claimant, worker, or customer asks why something happened, the institution should be able to reconstruct the source-to-decision path: which acquired or inferred fields mattered, what confidence or match threshold was used, what model or rule version acted, what vendor supplied the signal, what human review occurred, and which correction or appeal deadline applies. That is the difference between a privacy file and a due-process file.

The second control is purpose discipline. Data minimization is not only "collect less." It means collect for a defined purpose, keep the smallest workable record, avoid sensitive inferences unless necessary, separate authentication from profiling, and prevent data gathered for service from becoming quiet evidence for discipline, pricing, exclusion, or model training. NIST's Privacy Framework is useful here because it treats privacy as an enterprise risk-management problem rather than a one-time disclosure notice.

The third control is deletion that reaches derived artifacts. A brokered address, a location trail, or a behavioral segment can survive as a score, embedding, risk label, model memory, or downstream export even after the source file is gone. Serious deletion practice has to cover indexes, caches, retrieval stores, personalization memories, vendor subprocessors, and future refreshes. Otherwise the person deletes the visible record while the operational dossier keeps speaking.

The fourth control is contestability. In high-impact domains such as housing, employment, credit, insurance, health care, education, public benefits, law enforcement, immigration, and child-directed services, a dossier-mediated decision should carry notice, reasons, evidence access, correction, human review, and appeal. Adverse-action explanations and automated welfare decisions show why explanation without a remedy is not enough.

The fifth control is procurement discipline. Public agencies and regulated institutions should not treat "commercially available" as a synonym for safe, consensual, or fair. Vendor contracts need data lineage, reuse limits, retention schedules, audit access, incident notice, subcontractor visibility, deletion propagation, and a ban on using public-service interactions as an unreviewed training or profiling supply chain. The same logic belongs in vendor and platform governance and privacy and data stewardship.

For AI systems, the safety case also has to identify where personal data enters the system: training sets, fine-tuning, retrieval corpora, embeddings, prompt logs, agent tools, memory, analytics, evaluation data, abuse monitoring, and human-review queues. NIST's AI Risk Management Framework is helpful because its govern, map, measure, and manage functions force the organization to connect data flows to actual harms, responsibilities, and post-deployment monitoring.

Where the Book Needs Updating

The Digital Person is a 2004 book, and it shows. Its examples center on spyware, web bugs, data mining, airline passenger profiling, public records, the USA PATRIOT Act, identity theft, and database sharing between business and government. Those examples are still relevant, but the information environment has become more intimate, mobile, social, biometric, and generative.

The book also works mostly through U.S. privacy law and legal reform. That focus gives it rigor, but it can understate how much privacy now depends on platform design, procurement rules, labor rights, competition policy, standards bodies, public infrastructure, and international regulation. The problem is not only what courts recognize as privacy harm. It is also who builds the systems that make people administratively real.

The book also precedes modern privacy engineering practice. It does not give readers a complete vocabulary for data-protection impact assessments, machine-learning data sheets, model cards, synthetic data claims, vector databases, agent tool permissions, or enterprise AI logs. Those are not weaknesses in Solove's argument; they are the next layer of implementation.

Still, the age of the book is part of its value. It reminds readers that AI did not invent the crisis of machine-readable personhood. AI inherits decades of database practice, data-broker economics, public-private information exchange, weak consent rituals, and bureaucratic deference to records. The novelty is not that institutions have started making data doubles. The novelty is that those doubles can now be scored, narrated, simulated, and acted on with much greater speed.

What This Changes

The practical lesson of The Digital Person is that a person cannot be protected only at the moment of exposure. Protection has to cover the whole life of a record: collection, combination, inference, access, retention, sharing, decision, appeal, deletion, and reuse.

That lesson matters for AI agents, answer engines, automated welfare systems, hiring tools, companion memories, educational profiles, and workplace dashboards. These systems do not simply process information. They create institutional versions of people and then make those versions consequential.

A healthier system needs data minimization, real deletion, purpose limits, audit trails, human appeal, source visibility, correction rights, procurement discipline, and refusal paths that do not punish people for declining the dossier. It also needs a cultural shift: records should be treated as partial administrative artifacts, not as the person rendered in digital form.

The recurring pattern is record, score, workflow, authority. First the institution records. Then it scores or classifies. Then the workflow adapts around the score. Then the adapted workflow becomes evidence that the score was reality all along. Solove's book matters because it names the quiet danger before the interface becomes intelligent. Once the dossier is accepted as the person, every later automation inherits a category error. The system is not merely learning about someone. It is learning from an institutional shadow and then asking the person to answer for it.

Source Discipline

This page separates book facts, interpretive claims, and current governance claims. NYU Press, the GW Law repository, Solove's author page, the Surveillance & Society review, and Solove's later taxonomy article support the bibliographic and conceptual account. FTC, CPPA, DOJ, CFPB, and NIST sources support the current regulatory and governance context.

The AI reading is an application of Solove's dossier framework, not a claim that the 2004 book predicted today's model architectures. This page makes no claim that any AI system is conscious, divine, or AGI. Complaints, proposed orders, proposed rules, registries, and guidance documents are treated according to their status. A proposed FTC order is not final until approved by the court; a withdrawn CFPB proposal is not operative law; a data-broker registry is not an endorsement; and a claim that data is public or commercially available is not the same thing as consent, fairness, or safety.

Sources

Book links are paid affiliate links. As an Amazon Associate I earn from qualifying purchases.


Return to Blog · Return to Books