Blog · Review Essay · Last reviewed June 23, 2026

Discriminating Data and the Politics of Recognition

Wendy Hui Kyong Chun's Discriminating Data is a hard book for the age of machine learning because it refuses the easy story that biased systems merely fail. It asks what happens when recognition itself becomes a sorting machine.

Here, algorithmic recognition means the institutional practice of turning likeness into action: people, places, faces, words, habits, and relations are clustered into neighborhoods of predicted behavior, then routed toward suspicion, recommendation, eligibility, exposure, price, attention, or denial. The danger is not only misrecognition. It is recognition becoming a way to make inherited social order operational.

The Book

Discriminating Data: Correlation, Neighborhoods, and the New Politics of Recognition was published by the MIT Press. The publisher lists Wendy Hui Kyong Chun as author, the hardcover ISBN as 9780262046220, the hardcover publication date as November 2, 2021, and the paperback ISBN as 9780262548526 with a March 5, 2024 publication date. Penguin Random House's MIT Press distribution listing lists the paperback at 344 pages. Amazon's paperback product page uses 0262548526, the paperback ISBN-10, as its product identifier.

The book belongs beside Chun's earlier Control and Freedom and Updating to Remain the Same, but it is more directly aimed at machine learning. Its subject is not simply data bias. It is the deeper political fantasy that a population can be made legible by finding patterns of likeness, grouping people by correlation, and then calling the resulting recognition objective.

MIT Press describes the book as an argument about how big data and machine learning encode discrimination and produce clustered sameness. That description matters because Chun is not writing only about a defect in training data. She is asking why machine learning so often treats similarity, neighborhood, authenticity, and predictability as the route to knowledge.

Current Context

As of June 23, 2026, the politics of recognition is no longer confined to face recognition or recommender feeds. It appears in hiring screens, tenant screening, credit and insurance pricing, benefits fraud detection, school analytics, workplace monitoring, clinical triage, public-sector case routing, AI search, content moderation, and agentic systems that can act across records and accounts.

The governance context now names several pieces of Chun's problem. NIST Special Publication 1270 treats AI bias as a sociotechnical issue involving data, models, human behavior, and the social context of deployment. NIST's AI Risk Management Framework is voluntary, but it gives organizations a govern, map, measure, and manage vocabulary for AI risks. NIST's face-recognition evaluation work shows why context matters: its 2019 demographic-effects report tested nearly 200 algorithms from nearly 100 developers and found demographic differentials whose meaning depends on algorithm, application, and data.

The legal context is also more concrete. The EU AI Act treats biometrics, education, employment, law enforcement, access to essential services, migration, and justice as high-risk contexts when covered by Annex III and related classification rules. Article 10 requires data governance practices for high-risk systems, including bias detection and mitigation; Article 27 requires certain deployers to perform fundamental-rights impact assessments before use. In the United States, the 2023 FTC, DOJ, CFPB, and EEOC joint statement says existing civil-rights, consumer-protection, fair-competition, and equal-opportunity laws apply to automated systems. The CFPB has said black-box credit models do not excuse failure to give specific adverse-action reasons, and the EEOC's iTutorGroup settlement shows automated screening can be treated as ordinary employment discrimination.

Correlation Is Not Innocent

Chun's strongest move is to treat correlation as a historical and political instrument, not just a statistical technique. Predictive systems act through resemblance: people who look, act, click, buy, move, speak, or associate like others are treated as likely to share a future. The problem is not only that the resemblance can be wrong. The problem is that resemblance is already socially organized.

This matters for AI because machine learning often turns inherited relations into operational defaults. A model does not need an explicit racial category, class label, or political identity to reproduce social structure. Proxies, neighborhoods, embeddings, and interaction patterns can carry the work. The result is a system that appears to discover groups while helping to harden them.

The important word is neighborhood. A neighborhood in machine learning can be mathematical rather than municipal: a region of feature space, a cluster of embeddings, a set of nearest neighbors, a graph community, or a profile cohort. But the political problem is familiar. Once the system treats neighborhood as destiny, the person inherits expectations built from people the system has made similar to them.

This is a theory of algorithmic belief formation. The system says: you are like these people, so you will want this, fear that, fail here, buy this, belong there, or deserve this level of scrutiny. The user is then shown a world shaped by that classification. Prediction becomes training. Recognition becomes a loop.

The Recognition Trap

The politics of recognition has usually been framed as a demand to be seen. Chun asks what happens when being seen by computational systems means being sorted into managed similarity. In recommender systems, personalization can narrow a field of encounter. In facial recognition, recognition can become an infrastructure of suspicion. In social platforms, homophily can be treated as natural affinity even when the platform architecture amplifies sameness.

The book is valuable because it does not stop at saying that datasets are unrepresentative. That claim is true but incomplete. A more representative dataset can still support a harmful classificatory regime if the goal remains prediction through social sorting. The question is not only whose data are missing. It is what kind of world the system is trying to make predictable.

NIST's face-recognition work is useful context here because it treats demographic differentials as measurable system performance questions rather than vibes. That is necessary. Chun's contribution is to press the prior question: why are these systems being asked to recognize, cluster, and operationalize identity in the first place?

That question matters outside biometrics. A search engine recognizes authority. A recommender recognizes taste. A hiring tool recognizes fit. A fraud system recognizes suspicion. A school platform recognizes promise or risk. A workplace dashboard recognizes productivity. Each recognition is also a claim about what the institution is allowed to ignore.

Homophily and Polarization

Chun's account of homophily is one of the book's strongest contributions. Homophily is often summarized as the tendency of similar people to associate with one another. In data systems, that idea can become a design premise: people are grouped with similar others, shown more of what resembles prior behavior, and then measured by how predictably they remain inside the group.

This is where recognition becomes recursive. A recommender finds similarity, presents similarity, rewards similarity, and then records the user's response as evidence of authentic preference. The platform can call the result personalization even when the system has helped produce the taste it claims merely to discover.

The governance problem is not solved by saying "diversify the feed" or "remove the protected attribute." Proxies can preserve old divisions, and shallow diversity can still route people through a system whose objective is engagement, conformity, or manageable prediction. A serious review has to ask what kinds of difference the system can tolerate, what kinds it suppresses, and whether the user can escape the neighborhood without penalty.

The Governance Reading

Read in 2026, Discriminating Data is a governance book even when it is not written in the idiom of compliance. NIST's AI Risk Management Framework frames AI risk management across design, development, use, and evaluation. The European Commission's AI Act page presents risk-based rules and identifies areas such as biometrics, education, employment, and law enforcement as high-risk contexts. Those frameworks make one thing clear: discriminatory data practices are not only research problems. They are institutional deployment problems.

Chun's book sharpens that lesson. Governance cannot be limited to model accuracy or post-deployment audits. It must ask about defaults, categories, optimization goals, data lineage, proxies, feedback loops, affected communities, and the right to refuse classification. A system can be accurate and still politically destructive if it accurately reproduces a segregated world.

A recognition audit should therefore ask:

Recognition target: What is the system trying to recognize: identity, risk, taste, fraud, fit, credibility, emotion, productivity, or belonging?
Neighborhood logic: Which features, embeddings, graph relations, labels, or proxies make people similar enough to be acted on together?
Default person: Whose behavior, face, dialect, body, history, or institutional record becomes the norm against which others are measured?
Exposure pattern: Who meets the system as convenience, and who meets it as suspicion, denial, surveillance, or forced correction?
Feedback loop: How do prior classifications, patrols, clicks, denials, complaints, rankings, or appeals become new training or evaluation data?
Recourse: Can affected people know the system was used, inspect or correct relevant data, contest the category, reach a human with authority, and obtain a changed result?

Those questions belong in procurement as much as ethics review. Vendor claims about fairness should be tied to system purpose, deployment setting, subgroup and intersectional testing, data provenance, logging, appeal support, monitoring, and a retirement condition. A model that cannot explain its neighborhood logic should not quietly become the front door to work, school, housing, care, credit, liberty, or public standing.

Where the Book Needs Care

The book's language is dense, and that density matters. Chun is writing across media theory, statistics, race, platform studies, and political critique. The price is that some readers looking for a procurement checklist or technical audit method will have to do translation work. This is not a weakness exactly, but it affects how the book travels into policy rooms.

The other caution is that "correlation" can become too capacious if used as a universal villain. Some correlations are useful, some are dangerous, and many are only meaningful inside a specific decision setting. The book is at its best when it keeps the question concrete: what relation is being measured, who benefits from making it predictive, and what alternatives are foreclosed once the pattern becomes policy?

The same care applies to biometrics and recommendation. A benchmark can identify measurable differentials without answering whether the system should be deployed in a given context. A recommender can widen discovery in one setting and deepen segregation in another. A clinical risk model, a search ranking, and a police watchlist are not the same artifact. Chun's vocabulary is strongest when it forces more exact questions rather than treating every correlation as morally identical.

What This Changes

Discriminating Data gives this archive a vocabulary for the social life of machine learning. Ask not only whether a system is biased, but what model of likeness it builds. Ask how it defines neighborhoods, what it treats as normal, what it recognizes as signal, and what gets disciplined as noise. Ask whether the system opens new possibilities or makes old categories more durable.

The practical lesson is sober: desegregating AI is not achieved by adding a fairness metric to an unchanged architecture of recognition. It requires different defaults, different publics, different data practices, and different rights around refusal, contestation, and repair. Chun's book is a warning against mistaking visibility for justice. To be recognized by a machine is not the same as being understood.

That is the link to the site's recurring concern with machine-readable reality. Records feed models; models group people; institutions act on the groups; those actions create new records; the next system treats the accumulated record as evidence. The loop can make old categories look like fresh discoveries because the world has been organized to confirm them.

The answer is not to pretend institutions can act without categories. It is to keep categories provisional, visible, contestable, and accountable to the people they organize. Recognition should not be allowed to become an unappealable substitute for relation.

Source Discipline

This review separates book metadata, interpretive claims, technical evaluation, and current governance claims. MIT Press, Penguin Random House, and Amazon support edition and retail metadata. Chun's book supplies the conceptual argument about correlation, neighborhoods, homophily, and recognition. NIST sources support claims about face-recognition testing, demographic differentials, sociotechnical bias, and voluntary AI risk management. European Commission and AI Act Service Desk sources support AI Act implementation and high-risk governance claims. FTC, CFPB, EEOC, and NYC sources support current U.S. enforcement and notice/audit context.

The review does not claim that every correlated model is unlawful, every recommender polarizes users, or every biometric system has the same error profile. Claims about algorithmic discrimination require deployment-level evidence: the system version, domain, population, input data, proxy variables, performance by subgroup, human workflow, appeal route, and downstream effect.

This page makes no claim that any AI system is conscious, divine, or AGI. It treats AI systems as sociotechnical arrangements: data, models, interfaces, institutions, labor, law, infrastructure, and power.

Control and Freedom, Updating to Remain the Same, and Programmed Visions place Chun's machine-learning argument inside her wider work on networks, habit, software, race, and memory.
Race After Technology, Algorithms of Oppression, Data Feminism, and Sorting Things Out extend the analysis of racialized classification, search authority, power-aware evidence, and infrastructure categories.
Weapons of Math Destruction, Automating Inequality, and The Loop connect recognition to scoring, welfare automation, feedback loops, and routed choice.
Algorithmic Bias, Biometric Categorization, Recommender Systems, AI Audits and Assurance, Algorithmic Impact Assessments, Notice and Appeal, and AI Data Provenance are the operational layer.

Sources

MIT Press, Discriminating Data: Correlation, Neighborhoods, and the New Politics of Recognition, publisher listing for title, author, eBook ISBN 9780262367257, hardcover ISBN 9780262046220, paperback ISBN 9780262548526, publication dates, publisher, and description, reviewed June 23, 2026.
Penguin Random House, Discriminating Data by Wendy Hui Kyong Chun, distribution listing for title, author, subtitle, paperback ISBN 9780262548526, MIT Press imprint, publication date, and page count, reviewed June 23, 2026.
NIST, Face Recognition Vendor Test Part 3: Demographic Effects, official publication page for NISTIR 8280 on demographic effects in face recognition evaluation, reviewed June 23, 2026.
NIST, NIST Study Evaluates Effects of Race, Age, Sex on Face Recognition Software, December 19, 2019 summary of demographic-differential findings, reviewed June 23, 2026.
NIST, Towards a Standard for Identifying and Managing Bias in Artificial Intelligence, Special Publication 1270, sociotechnical AI bias guidance, reviewed June 23, 2026.
NIST AI Resource Center, AI Risk Management Framework and AI RMF Core, official AI RMF overview, voluntary-use statement, and govern, map, measure, manage functions, reviewed June 23, 2026.
European Commission, AI Act, official page for Regulation (EU) 2024/1689, risk-based AI rules, high-risk use cases, GPAI rules, and implementation timeline, reviewed June 23, 2026.
European Commission AI Act Service Desk, Article 10: Data and data governance and Article 27: Fundamental rights impact assessment for high-risk AI systems, official explorer text and summaries, reviewed June 23, 2026.
Federal Trade Commission, FTC, DOJ, CFPB, and EEOC joint statement on artificial intelligence and automated systems, April 25, 2023, reviewed June 23, 2026.
Consumer Financial Protection Bureau, Consumer Financial Protection Circular 2022-03, adverse-action notice requirements for complex credit algorithms, reviewed June 23, 2026.
Equal Employment Opportunity Commission, iTutorGroup settlement announcement, automated screening and age-discrimination allegations, reviewed June 23, 2026.
New York City Department of Consumer and Worker Protection, Automated Employment Decision Tools, official bias-audit, notice, public-summary, and complaint context, reviewed June 23, 2026.

Book links are paid affiliate links. As an Amazon Associate I earn from qualifying purchases.

Buy on Amazon Browse Books

Amazon, Discriminating Data by Wendy Hui Kyong Chun, retail listing and ISBN-10/ASIN product identifier 0262548526, reviewed June 23, 2026.

Return to Blog · Return to Books