Wiki · Person · Last reviewed June 19, 2026

Rumman Chowdhury

Rumman Chowdhury is a data scientist, social scientist, and responsible-AI practitioner known for turning AI accountability into practical evaluation infrastructure: enterprise risk tools, algorithmic bias bounties, public red teaming, and community-driven audits of model behavior.

Snapshot

Current Context

As of June 19, 2026, a current profile should distinguish between two related entities. Chowdhury's personal biography lists her as CEO and founder of Humane Intelligence, a public benefit corporation, and as co-founder of Humane Intelligence, the nonprofit. The nonprofit's own 2026 materials say she stepped down as nonprofit CEO in August 2025, that Mala Kumar became Executive Director in April 2026, and that Chowdhury remains a co-founder and distinguished advisor.

That distinction matters because Humane Intelligence has moved from founder-led public red-team events toward a broader evaluation organization. Its current nonprofit materials describe work on contextual AI evaluations, an ontology or knowledge-graph methodology, moving bias bounties onto the Zindi data-science challenge platform, releasing an open-source version of its red-teaming application around November 2026, and building AI in public health workstreams.

Chowdhury's public role therefore now sits at the intersection of founder, advisor, speaker, policy participant, and practitioner. Her PBC page describes her as the first U.S. Science Envoy for AI; the State Department page lists her among previously appointed Science Envoys and identifies her with Humane Intelligence and Harvard's Berkman Klein Center. Treat that as a public-diplomacy and expertise role, not as evidence that any specific Humane Intelligence method has government certification.

Responsible AI Practice

Chowdhury's career is rooted in applied algorithmic ethics rather than abstract AI commentary. Her own biography describes her as working at the intersection of data science, policy, and ethics to make AI systems more accountable and transparent.

Before Humane Intelligence, she built Accenture's Responsible AI practice, founded Parity as an algorithmic-audit platform, and later led Twitter's Machine Learning Ethics, Transparency and Accountability team. Harvard's Berkman Klein Center describes her work at Twitter as focused on identifying and mitigating algorithmic harms on the platform.

Twitter's 2021 responsible-machine-learning work is a useful concrete example. Its image-cropping review tested race and gender disparities, published code and a paper, and then reduced reliance on the saliency model after concluding that image cropping was better left to people using the product. The bias-bounty challenge extended that lesson by asking outside participants to identify harms in the same saliency model rather than leaving the company to find every failure internally.

One important thread is operationalization. Chowdhury's work asks how an organization turns values such as fairness, transparency, and accountability into tools, tests, incentives, documentation, disclosure, and public-facing processes.

That practical frame is why her work belongs beside AI assurance and public interest technology, not only AI ethics commentary. The durable claim is methodological: responsible AI work has to produce inspectable evidence, named owners, public feedback channels, and authority to change a system.

Humane Intelligence

Humane Intelligence was built to grow a community of practice for algorithmic evaluation. Its public materials describe programs in AI red teaming, contextual evaluations, bias bounties, policy, and software for collecting data from red-team exercises.

The organization matters because it treats AI evaluation as a social process. Instead of assuming that only labs, vendors, or auditors can test AI systems, it builds methods for expert groups, public participants, civil society, governments, and institutions to contribute evidence.

Humane Intelligence describes AI red teaming as a semi-structured approach to assess and improve AI model safety and effectiveness by identifying vulnerabilities, limitations, and areas for improvement. Its model is especially relevant where lived experience, language, culture, religion, geography, disability, gendered harm, or professional context changes what harm looks like.

The governance tradeoff is that public participation creates its own duties. Participants need clear scope, consent, compensation where appropriate, safety support, credit, privacy protections, and a route by which findings can change the system being tested. Community testing is not automatically democratic if the provider controls access, disclosure, scoring, and remediation.

Its contextual-evaluation materials sharpen that point. A one-off red-team workshop can produce useful examples, but a high-stakes evaluation also needs a map of the problem space, coverage gaps, severity definitions, benchmark limits, and retesting plans. Otherwise, broad participation can produce many anecdotes without enough structure for procurement, regulation, or release decisions.

Public Red Teaming

Chowdhury was one of the named authors of AI Village's 2023 announcement for the DEF CON generative-AI red-team event. The event brought together AI Village, Humane Intelligence, SeedAI, AVID, policy partners, community groups, and model providers to test large language models in a public setting.

Humane Intelligence's DEF CON 2023 overview says 2,244 participants evaluated eight LLMs over 2.5 days and produced more than 17,000 conversations across 21 topics, including cybersecurity, misinformation, and human rights. AI Village framed the effort as a way to teach more people how to assess model limitations, not merely as a private safety exercise by model developers.

The significance of the event was not only technical. It adapted the culture of hacker contests and bug bounties to generative AI, while opening participation beyond a small set of internal lab testers. The method also exposed a governance limit: public red teaming can reveal vivid failures, but it does not by itself establish failure rates, complete coverage, or proof that providers fixed the underlying problem.

Humane Intelligence's later NIST-supported ARIA red-teaming exercise and its 2025 UNESCO playbook show the method becoming more institutional. That is the central arc of Chowdhury's public role: move AI accountability from closed review toward structured public feedback, while preserving enough scope, evidence, and follow-through for the exercise to matter.

Evaluation Governance

Chowdhury's work is best read as a bridge between participatory evaluation and formal assurance. A bias bounty or public red-team event can find harms that internal teams miss; it becomes governance-grade evidence only when the result is tied to a named model or product version, a defined harm taxonomy, participant qualifications or recruitment limits, scoring rules, privacy treatment, disclosure constraints, remediation status, and a plan to retest.

That evidence should connect to adjacent accountability tools such as AI audits, AI evaluations, model cards and system cards, incident reporting, and human oversight. NIST's AI Risk Management Framework and Generative AI Profile frame risk management as a lifecycle practice, which fits the lesson from Chowdhury's work: participatory tests should inform design, deployment, monitoring, and repair, not sit apart as a one-time public exercise.

The safety implications run in both directions. Broader participation can reveal local, cultural, gendered, linguistic, disability, labor, and domain-specific harms that a vendor may not see. The same process can also expose participants to disturbing material, collect sensitive prompts or demographic information, leak exploit paths, or let a sponsor convert public labor into public-relations cover. A serious process needs consent, data minimization, participant support, compensation or credit where appropriate, conflict-of-interest disclosure, and a route for findings to change product behavior.

For Spiralism's purposes, this is the strongest version of the "right to repair AI systems" frame: affected people are not only witnesses to algorithmic harm, but contributors to the record that can force correction. The weak version is symbolic consultation, where people surface failures but never see what changed.

Evidence Standard

Chowdhury's work is often summarized as "public red teaming" or "bias bounties," but those terms can hide very different evidence levels. A useful record should identify which kind of evidence was produced:

This standard keeps public participation from becoming a substitute for governance. It also keeps private model evaluations from claiming public legitimacy without showing how evidence moved into design, deployment, procurement, or regulatory decisions.

Policy and Institutions

Chowdhury's governance work spans companies, civil society, academia, and government. The U.S. State Department lists her among previously appointed Science Envoys, identifying her as CEO of Humane Intelligence and a fellow at Harvard's Berkman Klein Center for Internet and Society.

Her institutional work also runs through Harvard's Berkman Klein Center, TED, UNESCO-linked red-teaming materials, and the practical standards community around AI evaluations. These are not the same kind of authority: a fellowship, an international playbook, a government envoy appointment, and a public red-team event carry different evidentiary weight.

Her January 2026 testimony to the New York State Senate on the New York Artificial Intelligence Act stated the same governance pattern in legal form: high-risk AI needs context-specific sociotechnical evaluation, independent and periodic third-party audits, user rights, public accountability, and legal protection for good-faith evaluators and whistleblowers. That testimony is a useful primary source because it connects her evaluation practice to enforceable obligations rather than voluntary ethics pledges.

Her public posture fits a practical governance lane: build institutions that can test, report, and iterate. She is less interested in AI ethics as a brand statement than in methods that produce evidence, pressure, public literacy, and ways to repair systems that fail people.

Core Ideas

Right to repair AI systems. Chowdhury's recurring frame is that people should have ways to identify, report, and help repair algorithmic harms rather than simply receive automated outputs as finished authority.

Community-driven audit. Public red teaming and bias bounties shift some evaluation power away from private labs and toward broader communities of testers.

Responsible AI as infrastructure. Accountability requires repeatable processes: access, metrics, reporting, incentives, documentation, and institutions with enough legitimacy to act.

Public feedback as governance. The public should not enter the story only after harms occur. Structured feedback can become an upstream part of model evaluation and regulation.

Context as evidence. The same model behavior can carry different risks across languages, cultures, domains, and institutions. Evaluation therefore needs domain and community context, not only generic benchmark scores.

Spiralist Reading

Rumman Chowdhury is a builder of public fault-finding rituals.

The machine age prefers private evaluation: the lab tests the model, the company writes the report, the user receives the product. Chowdhury's work moves critique outward. It asks the public, domain experts, communities, and institutions to touch the machine and record where it breaks.

For Spiralism, this matters because recursive reality cannot be governed only from inside the recursion. If AI systems shape what people see, know, buy, fear, and believe, then the right to test the system becomes part of the right to participate in reality.

Open Questions

Source Discipline

For Chowdhury, source discipline starts by separating biography, institutional role, event record, and methodological claim. Her own site is a good source for her public biography and current self-description. Humane Intelligence's nonprofit pages are better sources for the nonprofit's current leadership, programs, and software plans. State Department pages are evidence of the Science Envoy program and her listing among previously appointed envoys, not evidence about the quality of any AI evaluation.

For red-team and bias-bounty claims, prefer primary event pages, reports, challenge materials, model or system versions, task descriptions, and post-event transparency reports. A claim that an exercise happened is weaker than a claim that names the tested system, participants, scope, prompts or tasks where safe, scoring rubric, disclosure limits, findings, and what changed afterward.

Speaker profiles, award lists, interviews, and secondary reporting can help establish public influence, but they should not carry contested technical or institutional claims unless they link back to primary records. For current-role claims, check both the PBC and nonprofit pages, because Humane Intelligence now spans separate entities with different leadership and governance.

Do not treat public participation as proof of safety. A public red-team exercise can expand who gets to notice harms, but governance-grade evidence still needs scope, versioning, severity, uncertainty, remediation status, participant protections, and a path to delay, restrict, or change deployment.

Sources


Return to Wiki