Blog · Review Essay · Last reviewed June 16, 2026

Data Feminism and the Politics of Counting

Catherine D'Ignazio and Lauren F. Klein's Data Feminism is one of the most useful books for keeping AI governance grounded before the model appears. Its central claim is not that data science needs a warmer vocabulary. It is that every dataset, chart, benchmark, and model has a politics of power: who defines the categories, who performs the labor, who becomes visible, who is made absent, and who can challenge the result when the record starts acting on them.

For this review, data feminism means an operational discipline for data work under unequal conditions. It asks a system to disclose the social theory hidden in its categories, the institutional purpose behind its collection, the labor behind its cleaning and labeling, and the people who will bear the cost of errors.

The Book

Data Feminism was published by MIT Press in 2020 and remains available in an open-access MIT Press edition. MIT Press lists Catherine D'Ignazio and Lauren F. Klein as the authors and describes the book as a way of thinking about data science and data ethics through intersectional feminist thought.

The book organizes its argument around seven principles: examine power, challenge power, value embodied and emotional knowledge, rethink binaries and hierarchies, embrace pluralism, preserve context, and make labor visible. Those principles are not decorative ethics labels. They are inspection points for the whole data pipeline.

The useful definition is this: data feminism treats data science as a material practice of classification, visibility, and authority. A dataset is not only a technical asset. It is a claim about what exists, what matters, what can be ignored, and who gets to decide. The practical question is not only whether the data is biased; it is who has standing to say the data is unfit for this use, who can supply counter-evidence, and who can stop the system when the record is wrong. That is why the book belongs beside How Data Happened, All Data Are Local, and Sorting Things Out: all three ask how records become reality.

Data Work Is Power Work

The strongest lesson is that data does not arrive from nowhere. It is collected by institutions, shaped by forms, cleaned by workers, constrained by budgets, limited by law, and interpreted through social assumptions. A missing field, a default category, a proxy variable, or a dashboard color can become an administrative fate when an institution attaches consequences to it.

That makes data work power work. The person who designs a category can decide whether a household, disability, gender identity, language, job history, neighborhood, or debt relation becomes legible. The person who controls a dashboard can decide which harms are monitored and which remain anecdotal. The person who defines a benchmark can decide which capabilities will look measurable and which failures will look invisible.

This is the bridge to AI governance. Before a model predicts, a data pipeline has already made decisions about the world. The model inherits those decisions and may amplify them through scale, fluency, and automation. A system that sounds neutral can still carry the theory of the institution that built the table.

Absence and Category

Data Feminism is especially strong on the politics of absence. Missing data is not always a technical defect. Absence can reflect neglect, danger, privacy, legal exclusion, bad instrumentation, institutional disinterest, or deliberate refusal. Treating every gap as noise can erase exactly the people a justice-oriented system should notice.

The hard part is that absence has more than one meaning. Sometimes the right response is to collect data because official ignorance protects abuse. Sometimes the right response is not to collect because visibility increases surveillance or retaliation. The governance question is therefore not "more data or less data" in the abstract. It is: who needs the record, who controls it, what harm does collection prevent, what harm does collection create, and who can refuse or correct the frame?

Categories create the same problem at the point of entry. A binary field can make a person administratively impossible. A race or disability category can enable civil-rights enforcement in one setting and risk profiling in another. A fraud-risk label can become a lifelong mark if there is no path to correction. Feminist data practice asks whether the category is necessary, proportional, context-specific, contestable, and maintained by people accountable to those it names.

The AI-Age Reading

In the AI era, data feminism is not a niche correction to model culture. It is upstream safety work. Training data, benchmark data, evaluation data, red-team data, metadata, labels, retrieval indexes, prompts, logs, and user telemetry all carry politics before they carry signal.

Generative AI intensifies the problem because data can be laundered through fluency. A model can summarize a biased archive in a neutral tone, generate confidence from an incomplete corpus, or turn a contested category into an apparently settled answer. The interface hides the politics of counting behind the smoothness of language.

Agentic systems raise the stakes again. When a model can classify, recommend, message, escalate, deny, route, purchase, or trigger an institutional workflow, the old dataset question becomes an action question. What did the system see? What did it fail to see? Which category activated the next step? Who receives notice? Who can appeal? Who can stop the loop?

That is why this book reads naturally beside Atlas of AI, Algorithms of Oppression, and Weapons of Math Destruction. The shared concern is not whether machines are conscious or mystical. It is how machine-readable categories become institutional authority.

Governance and Safety

As of June 16, 2026, current AI governance has moved toward the book's terrain. Article 10 of the EU AI Act requires data governance practices for high-risk AI systems that use training, validation, or testing datasets. The official text names design choices, data origin, original purpose for personal data, annotation, labeling, cleaning, assumptions about what data measure, bias examination, mitigation, and relevant data gaps. Article 113 sets the Act's general application date at August 2, 2026, with some obligations staged separately.

Article 10 also shows the tradeoff at the center of the book. Bias detection may require attention to sensitive attributes, but the Act allows special-category personal data for that purpose only when strictly necessary and with safeguards such as purpose limits, access controls, security measures, deletion, and records explaining why other data would not work. That is data feminism as governance: visibility can be necessary for accountability, but visibility without limits becomes surveillance.

For general-purpose AI, Article 53 requires providers to draw up and make available a sufficiently detailed summary of training content according to a template from the AI Office. The European Commission published that public-summary template on July 24, 2025 and last updated the page on March 26, 2026. A public summary is not full provenance, dataset release, consent review, labor disclosure, or community accountability. It is a floor for external scrutiny, not a substitute for the records a deployer needs before putting a model inside a consequential workflow.

NIST's AI Risk Management Framework gives the operational grammar: govern, map, measure, and manage. Data feminism makes those verbs more concrete. "Map" should include the power map of the dataset: source, collector, consent or legal basis, category definitions, affected groups, labor conditions, known gaps, intended use, disallowed use, update cadence, and recourse path. "Measure" should include subgroup performance, representational harms, data gaps, and the costs of false positives and false negatives. "Manage" should include the authority to pause or withdraw a system when its data frame fails.

Dataset documentation practices help turn this into paperwork an auditor can inspect. Datasheets for Datasets proposes documenting motivation, composition, collection, preprocessing, recommended uses, and maintenance. Data Cards frame documentation as a product for multiple audiences across the dataset lifecycle. The Data Provenance Initiative's 2024 audit of more than 1,800 text datasets found major licensing and attribution problems, including widespread unspecified or miscategorized licenses. That empirical gap matters because a model cannot respect people, licenses, labor, or context it has been engineered not to remember.

The audit record should preserve absences as well as entries. It should say whether a group was never measured, measured under a harmful category, suppressed for privacy or safety, hidden by aggregation, excluded by licensing, or removed during filtering. Those are different conditions with different remedies. Treating them all as "missing data" produces weak governance.

The safety implication is plain. High-impact AI systems should not launch on undocumented data folklore. They need provenance records, category reviews, affected-community review where appropriate, privacy and data-minimization analysis, rights or licensing checks, labor disclosure, monitoring plans, appeal routes, and a named owner with power to halt use. Otherwise a system can turn old omissions into new decisions faster than affected people can prove the omission was there.

Where the Book Needs Friction

The book's moral clarity can be misused if readers treat it as a license to declare every dataset oppressive or every data project suspect. Some forms of counting are essential for civil-rights enforcement, public health, labor protection, environmental justice, and institutional accountability. A community harmed by official silence may need better data, not less data.

The harder lesson is contextual judgment. Data can expose harm and data can enable control. Pluralism can improve knowledge and pluralism can create real disagreement about what should be counted. Privacy can protect vulnerable people and privacy can be invoked by powerful institutions to hide abuse. The book is most useful when it forces those tradeoffs into the open instead of pretending technique has already settled them.

It also predates the current wave of foundation models, synthetic data, retrieval-augmented generation, model cards at scale, training-data litigation, and public-summary obligations under the EU AI Act. The principles still travel, but the implementation has to be updated: provenance must cover massive data mixtures, generated data, dataset repackaging, evaluation leakage, and downstream logs that become future training material.

What This Changes

Data Feminism changes the order of audit. Do not begin with the model and then ask whether the output seems fair. Begin with the world the model was allowed to see, the categories used to make that world computable, the absences treated as irrelevant, the labor hidden behind "automation," and the people denied the power to contest the frame.

The practical checklist is direct: name the data source, name the institution that collected it, state the original purpose, preserve the category definitions, record who is missing, document the cleaning and labeling labor, test the system on the people it will affect, explain what the data cannot support, and give affected people a correction path before the score becomes action.

The book's deeper contribution is to make neutrality expensive. A system may still claim objectivity, but it has to show the record of how it got there: what it counted, what it excluded, who checked the categories, who can challenge the output, and who is responsible when the data story injures someone.

Source Discipline

This review separates book facts, interpretive claims, and current governance claims. MIT Press and the open-access edition support publication details and the book's principles. EU AI Act and European Commission sources support legal and timeline claims. NIST supports the risk-management vocabulary. Dataset-documentation and provenance papers support the implementation claims about datasheets, Data Cards, and licensing or attribution gaps.

The AI-era reading is an application of the book's framework, not a claim that D'Ignazio and Klein predicted every feature of generative AI, agents, or 2026 regulation. This page does not claim that any AI system is conscious, divine, or AGI. It treats AI systems as institutional machinery that can classify, summarize, persuade, and trigger action when organizations give them authority.

Sources

Book links are paid affiliate links. As an Amazon Associate I earn from qualifying purchases.


Return to Blog · Return to Books