Blog · Analysis · Last reviewed June 25, 2026

The Public Comment Bot Enters Rulemaking

When generated comments can imitate civic participation at scale, agencies need better ways to hear publics without mistaking automation for democratic weight.

The central question is not whether people may use tools to participate. It is whether the docket can preserve the difference between human voice, assisted voice, organized voice, falsely attributed voice, and synthetic pressure.

The governed object is the comment pipeline: drafting, authorization, submission, posting, clustering, agency response, and later reuse.

The Comment File

Notice-and-comment rulemaking is one of the quiet machines of democratic government.

An agency proposes a rule. The public can comment. Trade associations, companies, unions, civil-rights groups, states, local governments, researchers, affected workers, families, patients, students, and ordinary residents can put evidence, objections, alternatives, costs, lived experience, and legal arguments into the record. The agency then has to consider the relevant material before finalizing the rule.

This is not town-hall romance. Most people never read the Federal Register. Many comments are organized by advocacy groups. Sophisticated actors already have lawyers, consultants, data, and time. But the comment file still matters because it is one of the places where public reasoning becomes administrative memory. It is not only expression. It is a record that can shape final rules, future litigation, agency learning, and public accountability.

For this essay, a public comment bot is any automated or semi-automated system that generates, varies, submits, coordinates, or falsely attributes comments in a public rulemaking docket. That includes crude spam scripts, campaign tools that personalize templates, agent workflows that draft comments for human review, and systems that submit generated text under names of people who did not authorize it. The problem is not assistance. It is assistance that impersonates public voice or converts scale into false democratic weight.

Docket integrity means preserving enough information to tell four things apart: who authorized a submission, how the text was produced, which channel or campaign carried it, and what kind of contribution it makes to the rulemaking record. That does not require publishing every commenter's identity. It does require agencies to avoid collapsing authorized assistance, organized mobilization, synthetic variation, and identity misuse into one undifferentiated count.

A person using an AI assistant, translator, disability accommodation, lawyer, family member, union, or advocacy template is not the same governance object as a bot farm or a campaign that files generated text under names that never approved it. The bright line is not polished prose. It is authorization, provenance, final review, and whether the record explains what kind of evidence the comment is.

That record now sits in a different media environment. A generated comment can be cheap, fluent, personalized, jurisdiction-specific, and produced in thousands of variants. A fake grassroots campaign can look less like copy-paste spam and more like a field of distinct citizens. The administrative state is being asked to hear the public through a synthetic fog.

Current Context

As of June 25, 2026, the legal core is still the Administrative Procedure Act. 5 U.S.C. 553 requires general notice of proposed rulemaking, gives interested persons an opportunity to submit written data, views, or arguments, and requires agencies to consider the relevant matter presented before issuing a concise statement of basis and purpose. The statute does not turn comment volume into a vote count.

The public-facing infrastructure is now partly machine-readable. Regulations.gov remains a central federal comment portal, and GSA's public API documentation describes GET endpoints for searching documents, comments, and dockets, plus data-limitations material about publicly viewable and agency-configurable comment fields. That makes the comment file both a civic participation channel and a dataset that researchers, campaigns, agencies, vendors, and automated tools can query and process at scale. It also means docket integrity is a data-governance problem, not only a web-form problem.

The main federal guidance for mass, computer-generated, and falsely attributed comments is still the Administrative Conference of the United States' Recommendation 2021-1, published in the Federal Register in July 2021. It recommends de-duplication practices, flagging or separate handling for identified computer-generated comments, docket notes when comments are removed or treated separately, paths for people to report falsely attributed comments, public explanation of comment policies, and continuing technical coordination and training. It is useful guidance, not binding law, and it was written before ordinary campaign tools could generate thousands of non-identical civic narratives.

That gap matters because OIRA's 2023 regulatory-participation memorandum and OMB's January 15, 2025 M-25-07 memorandum both push agencies toward broader, more equitable engagement, especially for communities that are affected by government decisions but hard for ordinary agency processes to hear. The current task is therefore two-sided: protect docket integrity without turning anti-bot controls into barriers for people who already find rulemaking hard to enter.

OMB's M-23-22 digital-service memorandum adds the interface constraint. Public-facing government digital services are expected to be accessible, usable, secure, clearly branded, and understandable. A rulemaking portal that defeats bots by defeating screen readers, shared devices, low-bandwidth users, limited-English commenters, privacy-protective browsers, or confused first-time participants has protected the form while weakening participation.

Fake Consensus Before AI

The problem did not begin with generative AI.

The clearest modern warning came from the Federal Communications Commission's 2017 net-neutrality proceeding. After the FCC received a record-breaking volume of comments, investigations found massive comment fraud. The New York Attorney General's 2021 report and press materials said the investigation found 18 million fake comments in the FCC proceeding and half a million fake letters sent to Congress. The report concluded that real people's names and addresses were used without consent and described separate campaigns that generated large numbers of comments with misleading tactics and little meaningful consent from the people named.

The Government Accountability Office added a broader administrative warning. GAO found that the APA does not require commenters to disclose identity information and does not require agencies to verify commenters' identities. It also found that agency practices for posting duplicate comments and associated identity information varied considerably and were not always clearly communicated. The result is a public record that can be misread: a docket may show a representative duplicate, every duplicate, anonymous comments, named comments, or inconsistent identity fields depending on agency practice.

This older case matters because it separates two questions that are often confused. First: did real people participate? Second: did the comments add relevant information to the rulemaking record? A million authentic but identical slogans may have political meaning. They may not add a million units of legal or technical evidence. A smaller set of detailed comments from affected people may matter more for agency reasoning than a larger pile of automated agreement.

Generative AI does not invent astroturf. It lowers the cost of making astroturf less obvious.

What Generative AI Changes

Earlier mass-comment campaigns often had visible fingerprints: identical text, repeated templates, suspicious email domains, mismatched identities, or a sudden flood from a single organizing pathway. Those signals are still useful. They are no longer enough.

Large language models can rewrite the same position in many styles. They can add local details, vary tone, cite agency language, summarize technical claims, and generate plausible personal narratives. A campaign can prompt for comments from simulated small-business owners, parents, nurses, veterans, farmers, gig workers, tenants, teachers, or rural broadband customers. Some of those simulated comments may include invented experiences. Others may remix real grievances collected elsewhere. Still others may be posted by real people who consented to a one-click campaign without reading the generated text attributed to them.

The hard case is not crude spam. The hard case is model-mediated participation: a real person clicks a link, selects a position, perhaps enters a few details, and an AI system generates the comment. Is that public input? Yes, partly. Is it the person's own testimony? Not necessarily. Does it deserve the same treatment as a handwritten account of direct experience? No. Does it deserve deletion? Also not automatically.

The system now has more categories than "real" and "fake." There are human-authored comments, template comments, assisted comments, AI-drafted comments reviewed by a human, AI-generated comments barely reviewed by a human, falsely attributed comments, bot-submitted comments, and comments produced by organizations that speak for members with varying levels of consent.

A serious rulemaking system has to preserve those distinctions. Otherwise it will either overcount synthetic participation or overcorrect by treating ordinary people as suspicious unless they write like lawyers. It also has to separate style from source. A fluent comment may be fabricated. A short, awkward, translated, or template-assisted comment may contain real experience.

The civic status of a comment therefore depends on authorship, authorization, and evidentiary contribution. A human can use AI to draft a lawful, useful comment if they review it, own it, and correct it. An organization can mobilize members if the members understand what is being submitted in their names. A bot can also generate a plausible comment that no human experienced, reviewed, or authorized. The docket needs metadata and review practices that keep those cases from blending into one count or one "AI" label.

Notice and Comment Is Not a Vote

One protection is doctrinal: notice-and-comment rulemaking is not supposed to be a plebiscite.

The Administrative Conference of the United States has emphasized that agencies should focus on the substantive content of comments rather than treating comment counts as votes. A large volume of identical or near-identical comments can signal public salience, but agencies must still reason from evidence, statutory authority, technical feasibility, costs, benefits, alternatives, equity, rights, and the factual record.

That principle becomes more important as comments become easier to generate. If an agency treats volume as democratic proof, it invites automation to become representation. If a company, advocacy network, or foreign influence operation can manufacture 500,000 plausible comments, then the rulemaking interface becomes a scoreboard for whoever can generate more civic-looking text.

But the opposite error is also dangerous. Agencies should not dismiss mass participation simply because it is organized. Organized comments may represent real constituencies. Template campaigns can help people participate when they lack time, legal knowledge, or confidence. Disability groups, tenants, patients, workers, parents, and small organizations often need scaffolding to enter administrative processes dominated by professionals.

The standard should not be hostility to coordination. The standard should be clarity about what kind of input each comment provides: evidence, legal argument, lived experience, organizational position, member mobilization, political signal, or synthetic noise. This is the same source-separation discipline used in the site's Synthetic Consensus Firebreak: repeated agreement is not independent evidence unless the sources, methods, and incentives are independent.

The Legibility Trap

AI-generated comments are tempting because they make participation legible in the format the institution already accepts: text in a docket.

That is also the trap. A docket rewards written fluency. It privileges people and organizations that can translate experience into the language of reasons, citations, burdens, benefits, and statutory hooks. Generative AI seems to democratize that fluency. It can help a nurse explain a staffing burden, a tenant explain a housing rule, or a small business explain compliance costs. Used well, it can make the administrative process less exclusionary.

Used badly, it creates a simulation of participation. The agency sees clean prose, not the consent process that produced it. It sees a name, not whether the named person reviewed the comment. It sees a narrative, not whether the event happened. It sees geographic spread, not whether the campaign generated local color from a prompt. It sees civic texture, not the machine that manufactured texture. A provenance label can help classify this material, but as with synthetic media, provenance is not a truth machine.

This is model-mediated knowledge at the institutional edge. The agency is not only reading what the public says. It is reading what software says the public might plausibly say.

The deepest risk is not that every generated comment is false. The risk is that the comment file becomes a synthetic public: a record of apparent participation that future institutions, courts, journalists, and researchers may treat as evidence of public experience.

The Docket Integrity Ledger

The practical answer is not to publish more personal information. It is to keep a better evidence model. A docket integrity ledger is a structured, purpose-limited record that lets an agency, inspector, researcher, court, or affected person reconstruct how a comment entered the file without turning ordinary commenters into publicly searchable identity objects.

At minimum, the ledger should preserve docket ID, comment ID, submission channel, submitter type, sponsoring organization where applicable, campaign or bulk-upload identifier, template family, AI-drafting disclosure, human-review attestation, consent receipt for comments filed on behalf of a person, duplicate or near-duplicate cluster, falsely attributed or identity-dispute flag, withdrawal or anonymization request, triage category, deweighting or exclusion decision, final-rule response category, public API exposure status, and retention rule.

Some fields should be public. Others should be restricted. Sponsor identity, template-family counts, duplicate clusters, AI-assistance categories, bulk-channel use, and docket-level integrity summaries can usually be reported without doxxing individuals. Raw email addresses, phone numbers, home addresses, accessibility needs, assistive-technology status, disability accommodations, and identity-dispute evidence should be minimized, access-controlled, and retained only as long as the accountability purpose requires. That is the same receipt logic used in agent action logs and the same restraint required by data minimization.

The ledger also has to travel downstream. If an agency clusters 80,000 comments, removes known falsely attributed comments, deweights suspected automation, or relies on AI-assisted testimony, the final rule and public data exports should preserve enough labels for later readers to understand the transformation. Otherwise the docket becomes a clean-looking public register with the provenance stripped out. The point is not perfect certainty. It is an audit trail strong enough to distinguish evidence, mobilization, assistance, fraud, and synthetic pressure.

Failure Modes

The first failure mode is false attribution. A comment is submitted under a real person's name, address, email, or organizational identity without meaningful authorization. This is the net-neutrality lesson in its cleanest form: the record says a person spoke, but the person did not.

The second is salience laundering. A campaign treats thousands of generated or lightly varied submissions as proof that a concern is widespread, while the real source is one prompt, one sponsor, one list, or one vendor workflow.

The third is fluency bias. Agencies, journalists, and courts may give clean generated prose more attention than short, translated, misspelled, template-assisted, or oral-to-written comments from affected people. Good writing is not the same thing as evidentiary weight.

The fourth is detector overreach. Agencies may respond to generated comments with opaque AI-writing detectors, aggressive CAPTCHA gates, device checks, or identity requirements that create false positives and chill participation by people with disabilities, shared devices, language barriers, privacy concerns, or low bandwidth.

The fifth is source collapse. A docket, agency summary, media article, or later research dataset may combine unique testimony, template support, AI-assisted drafts, duplicates, bot submissions, and falsely attributed comments without preserving their different evidentiary status.

The sixth is consent thinning. A person clicks "support," signs a petition, or joins an advocacy list, and later a full generated comment is submitted in their name even though they never saw the final text, data claims, legal argument, or disclosure attached to it.

The seventh is record contamination. Public comments are scraped into training data, civic research datasets, agency summaries, or litigation records without labels showing that some portion was generated, duplicated, coordinated, or disputed. The synthetic public then becomes raw material for future systems.

The eighth is bulk-lane opacity. Agencies accept campaign feeds, batch uploads, or API-mediated comments without sponsor metadata, campaign identifiers, consent evidence, rate controls, abuse contacts, or a public explanation of how bulk comments are represented in the docket.

The ninth is accessibility collateral damage. Anti-bot gates make the portal harder for screen-reader users, people on shared devices, people with limited English, people using privacy-protective browsers, and low-bandwidth commenters, so the system protects itself by excluding the public it most needs to hear.

The tenth is ledger erasure. The agency does careful internal triage, but the final rule, API export, or archived docket exposes only a flattened comment count. Future readers then see a cleaner public record than the agency actually had.

A Governance Standard

Agencies should not try to solve this with a fantasy of perfectly pure authorship. Public participation has always included drafting help, coalition campaigns, lawyers, templates, translators, family members, staffers, and advocacy software. The goal is not to ban assistance. The goal is to keep assistance from impersonating public voice.

First, require clear attribution for organized and AI-assisted campaigns. Comment portals and campaign tools should let submitters disclose whether text was drafted by an organization, generated by AI, substantially edited by a person, or submitted as a template. The disclosure should be structured enough for agencies to analyze, not buried in prose. A useful schema separates sponsor, submitter, text-generation method, human-review attestation, template family, and submission channel.

Second, separate identity validation from public exposure. Agencies need stronger tools to detect falsely attributed comments, bot submissions, and repeat abuse. But public dockets should not force ordinary commenters to expose unnecessary personal data. Authentication should protect participation, not become an identity gate that chills it.

Third, classify comments by contribution type. A docket should distinguish unique evidence, unique personal experience, technical analysis, legal argument, template support, template opposition, duplicate text, and suspected automated or falsely attributed material. Counts can be reported, but they should not substitute for analysis.

Fourth, preserve human accessibility. If agencies harden the portal against automation, they must not make participation impossible for people with low bandwidth, disability access needs, limited English, shared devices, privacy concerns, or little bureaucratic literacy. Anti-bot design can quietly become anti-public design.

Fifth, audit campaign intermediaries. The FCC case showed that the weak point may be outside the agency portal: lead generators, advocacy vendors, list brokers, and political firms that collect names or submit comments on people's behalf. Consent records matter. So do penalties for knowingly submitting comments under names of people who did not authorize them.

Sixth, use AI defensively but cautiously. Agencies can use clustering, duplicate detection, language analysis, and anomaly detection to manage large dockets. They should not let detection models become unreviewable filters that discard inconvenient publics. Any automated triage should be logged, contestable, and checked by humans.

Seventh, publish docket-integrity summaries. For high-volume rulemakings, agencies should explain how many comments were received, how duplicates were handled, what validation steps were used, what categories were reviewed, what was excluded or deweighted, and what substantive themes affected the final rule.

Eighth, require consent receipts for comments submitted on behalf of people. Campaign tools should show the final text before submission, record what the person authorized, identify the sponsoring organization, preserve timestamped consent evidence, and provide a way to correct or withdraw falsely attributed comments after the deadline.

Ninth, preserve campaign provenance without overexposing individuals. Agencies and large submitters should retain structured metadata about template families, AI drafting, model or tool use, prompt classes, submission channels, sponsor identity, and version history. The public docket can disclose aggregate provenance while protecting ordinary commenters from unnecessary personal exposure.

Tenth, keep research access alive. Machine-readable docket data helps journalists, researchers, watchdogs, and affected communities find patterns in mass comments. Integrity hardening should not make the record less inspectable. Public APIs, bulk access, and documentation should be treated as transparency infrastructure, with privacy protections and rate limits where needed.

Eleventh, do not rely on AI detectors as adjudicators. Detection tools can support triage, but they are probabilistic, model-dependent, and vulnerable to false positives and false negatives. A person should not lose access to the rulemaking process because their comment resembles generated text.

Twelfth, final rules should describe how comment evidence was weighed. Agencies should explain how they handled duplicate campaigns, AI-assisted comments, suspected automation, falsely attributed reports, substantive evidence, and lived-experience testimony. A final rule should not turn a messy docket into a clean count without explaining the transformation.

Thirteenth, preserve the response path for falsely attributed people. A person who discovers a comment filed in their name should be able to report it, see whether the agency marked or removed it, and preserve enough evidence for investigation without being forced to expose more personal information than the correction requires.

Fourteenth, govern bulk submission channels explicitly. If an agency accepts batch uploads, campaign feeds, or API-mediated comments, it should require sponsor identity, campaign ID, template-family metadata, final-text review attestation, rate limits, abuse contacts, and a correction path. A governed bulk lane is better than hidden automation because legitimate organizers get a lawful path and agencies get metadata for source separation.

Fifteenth, publish the transformation from docket to rule. Final rules and response-to-comment documents should identify how the agency handled duplicates, templates, AI-assisted comments, suspected automation, falsely attributed comments, withdrawn comments, and substantive evidence. The public should be able to see the difference between receipt, posting, triage, reliance, and legal response.

Sixteenth, keep provenance with downstream datasets. Public APIs, bulk downloads, archives, research datasets, litigation files, and agency summaries should carry labels for duplicate clusters, campaign families, AI-assistance categories, disputed attribution, removal status, and retention limitations where those labels exist. A provenance field that disappears at export is not governance; it is an internal note.

Seventeenth, make exclusion reviewable. If a comment is removed, hidden, deweighted, clustered, or routed into a suspected-automation bucket, the action should be logged, reason-coded, and contestable where practical. Silent discard is dangerous because it can hide both fraud control and viewpoint suppression.

What This Changes

The public comment bot is a small machine with a large symbolic effect.

It sits at the hinge between voice and record. A person speaks, or appears to speak. A system stores the statement. An agency reads the file. A rule is justified. A court later examines the record. The public memory of participation becomes part of the law's reality.

That is why synthetic comments matter. They do not merely pollute a website. They threaten the chain that lets institutions say, with some humility, that they listened.

AI can help people enter that chain. It can translate bureaucratic language, summarize proposed rules, draft first versions, improve accessibility, and help non-experts make relevant points. That use expands agency. The danger begins when the model becomes the participant, the campaign becomes the public, and the record becomes a mirror reflecting organized prompts back to the state.

Recursive reality appears here in administrative form. Agencies ask the public what a rule will do. Automated systems generate public-looking answers. Agencies summarize those answers. Future actors cite the summary as evidence of public concern. The generated public becomes part of the institutional world it was generated to influence.

The answer is not nostalgia for paper comments or suspicion of ordinary assistance. It is source discipline for democratic input. Who authorized this comment? What role did a model play? Is the statement evidence, argument, template support, or synthetic pressure? What did the agency actually learn?

A comment system that cannot ask those questions will still collect text. It will not reliably hear the public.

Source Discipline

Public-comment claims need careful source separation. The APA establishes the opportunity to submit written data, views, or arguments and the agency duty to consider relevant matter; it does not create a one-comment-one-vote election. Regulations.gov and its API documentation establish the public infrastructure and machine-readable access paths; they do not prove any docket is complete, representative, or free of fraud. ACUS Recommendation 2021-1 is administrative best-practice guidance, not a statute and not a generative-AI-specific rule.

The New York Attorney General report is strong evidence about the 2017 FCC net-neutrality fake-comment episode, identity misuse, and campaign mechanics. It should not be stretched into proof that every large campaign is fraudulent. GAO reports are useful for the cross-agency management problem: identity fields, duplicate handling, agency practices, and public communication varied, and the APA does not require identity verification. OIRA and OMB participation materials point toward broader engagement; they should not be used to justify weakening docket-integrity controls.

AI-specific claims should name the layer: generation, human review, campaign coordination, submission channel, identity attribution, agency triage, public posting, final-rule response, or later research reuse. A comment can be AI-assisted and still authorized. A comment can be human-submitted and still misleading. A comment can be duplicated and still signal organized concern. The record should preserve those differences instead of using "bot" as a catch-all label.

Current-source claims in this article were checked against the named sources on June 25, 2026. The review treats the U.S. Code as statutory text, Federal Register materials as official publication records, ACUS as administrative guidance, GAO as audit evidence, state-attorney-general materials as investigative findings, Regulations.gov and GSA as infrastructure documentation, and OMB memoranda as federal executive-branch guidance for covered agencies.

Sources


Return to Blog