Blog · Analysis · Last reviewed June 23, 2026

The Synthetic Respondent Becomes the Public

Synthetic respondents promise cheap public opinion. The danger is that institutions start listening to a model of the public instead of the public itself.

For this essay, a synthetic public is a generated or imputed record presented as evidence about a group that was not actually contacted in the claimed research act.

From Polling to Silicon Sampling

The survey is one of the plainest democratic technologies: ask people what they think, record the answers, disclose the method, and accept that the public is not identical to the people who are loudest, richest, most online, or easiest to reach.

It was never perfect. Polling has nonresponse problems, weighting fights, question-order effects, mode effects, panel fatigue, bad respondents, poor incentives, and serious failures of prediction. But the moral structure is clear. A poll claims authority because real people were asked.

Synthetic respondents change that structure. Instead of interviewing people, a researcher or company prompts a large language model to act as many people: a suburban mother, a young Republican, a retired union member, a low-income renter, a coffee buyer, a climate skeptic, a swing voter, a lapsed Catholic, a warehouse worker, a college student. The model generates answers. The answers are aggregated. The result is presented as a possible view of the public.

The bright-line definition matters: a synthetic respondent is not a respondent. It is a model output designed to approximate how a person or subgroup might answer. It has no recruitment event, sampling probability, consent encounter, right to withdraw, right to correct the record, or lived stake in the result unless a human research process supplies those things around it. It is also not the same thing as a survey weight, a classical imputation, a persona for product design, or a human answer assisted by translation or accessibility tools. The governance question begins when generated answers are treated as evidence of public opinion, consumer demand, worker voice, patient preference, voter sentiment, or affected-community consent.

The practice is often called silicon sampling, synthetic sampling, synthetic users, or AI-generated survey response. The term comes from a 2023 Political Analysis study by Lisa Argyle and colleagues, "Out of One, Many," which conditioned GPT-3 on thousands of real respondents' demographic backstories and reported that the model could reproduce the response distributions of human subgroups with what the authors named "algorithmic fidelity." That paper is the optimistic origin of the field; nearly all the work since has been an argument over how far the fidelity actually reaches. It sits beside a related but different problem: real surveys being contaminated by respondents or bad actors using AI to produce answers at scale.

Both matter because they attack the same assumption from opposite directions. Silicon sampling says the human respondent may be optional. AI survey fraud says a coherent response may no longer prove that a human respondent exists.

Current Context

As of June 23, 2026, synthetic respondents have moved from a provocative social-science method into a live research-governance problem. Pew Research Center says it does not use AI to tell it what the public thinks. AAPOR released its task-force report on responsible AI integration in survey research on May 8, 2026; the report defines synthetic respondents as simulated survey participants generated by AI models and frames responsible AI use around validity, reliability, sensitivity, performance, and transparency. The 2025 ICC/ESOMAR International Code requires disclosure when AI, synthetic data, or synthetic personas play a significant role in sampling, deployment, analysis, reporting, or interpretation, and it requires the extent of human oversight to be stated.

That current context creates three separate uses that should not be collapsed. AI can assist research operations: questionnaire drafting, translation, coding, fraud review, summarization, or analysis. AI can augment existing human data through imputation, retrodiction, or hypothesis generation. Or AI can substitute for human respondents. AAPOR's report describes pre-field diagnostic testing, post-field augmentation or imputation, and synthetic data collection as a substitute for human respondents; it identifies substitution as the riskiest of those core tasks. The first two may be legitimate when disclosed, validated, reviewed, and kept subordinate to human data. The third is the dangerous one when a report claims to know a public it did not contact.

The professional direction is not simply "never use AI." It is source separation. AAPOR's transparency framework asks researchers to disclose where AI was in the loop and how human judgment shaped the output; ICC/ESOMAR asks researchers to disclose significant use of AI, synthetic data, synthetic personas, and human oversight. Those are disclosure rules, not magic validation. A labeled simulation can still be a weak basis for a consequential claim.

The practical rule is simple: synthetic respondents can help prepare research, but they cannot confer democratic legitimacy. A model can suggest what to ask. It cannot replace the act of asking people whose lives, rights, work, money, health, neighborhood, or political power are at stake.

Why It Is Tempting

The temptation is obvious. Human research is slow, expensive, and operationally annoying. Recruiting takes time. Representative samples cost money. Rare subgroups are hard to reach. International work adds language and cultural complexity. Qualitative research requires transcription, coding, consent, and interpretation. Market researchers face deadline pressure. Campaigns want signals now. Product teams want user feedback before the product exists.

A language model appears to solve all of this. It can generate thousands of answers in minutes. It can simulate personas that would be hard to recruit. It can produce open-ended responses that sound thoughtful. It can retrodict missing survey answers, stress-test questionnaires, generate hypotheses, and help researchers notice which questions might be confusing before fielding a real survey.

Some research shows legitimate promise. Kim and Lee's work on AI-augmented surveys used repeated General Social Survey data from 1972 to 2021 and reported strong performance for retrodicting missing opinions, while finding more modest results for predicting entirely unasked opinions. That is a useful distinction. Filling a gap inside a historical survey structure is not the same as replacing the act of asking people.

The best case for synthetic respondents is not that they are the public. It is that they can be a sandbox: a cheap way to pilot instruments, generate priors, explore old data, or identify where a real study should spend scarce attention. In evaluation terms, this means synthetic respondents are test material unless validated against the human population and decision context they are supposed to represent. In privacy terms, it also means source human data used to condition or validate simulations needs consent, minimization, retention limits, and protection from reuse as unlabeled synthetic testimony.

What the Evidence Shows

The evidence base is now large enough to support a disciplined warning.

Pew Research Center's May 2026 Q&A on AI and polling says directly that Pew does not use AI to tell it what the public thinks. Courtney Kennedy, Pew's vice president of methods and innovation, gives two reasons: ethical concern about replacing human voice and scientific concern about how AI estimates behave. Pew's summary of the research flags stereotyping, weaker representation of Republican viewpoints than Democratic ones, and understated disagreement.

AAPOR's task-force report reaches the same governance layer from inside the survey profession. It does not treat AI as a single forbidden tool. It asks where AI enters the lifecycle: questionnaire design, interviewing, coding, analysis, reporting, or the creation of synthetic cases. That distinction is essential because a model that helps clean misspellings is not the same object as a model that produces 10,000 simulated answers and calls them a sample.

A 2024 Political Analysis article by Bisbee, Clinton, Dorff, Kenkel, and Larson tested ChatGPT-generated synthetic respondents against American National Election Study data. The averages sometimes looked close. That is the seductive part. But the synthetic data had less variation, produced different regression results, shifted with prompt wording, and changed across a three-month period. The authors concluded that the synthetic data raised serious quality, reliability, and reproducibility concerns.

A 2025 Sociological Methods & Research article by Boelaert, Coavoux, Ollion, Petev, and Prag makes the warning sharper. It argues that current models cannot replace human subjects for opinion or attitudinal research, and that their answers show strong bias and low variance, with bias changing by topic.

Cross-national work adds another layer. A 2024 Humanities and Social Sciences Communications study found promise in public-opinion simulation, especially where training data is richer, but also highlighted limits in global applicability and reliability, demographic representation, and topic complexity. Its conclusion is not that the method is useless. It is that models inherit the unevenness of the world they learned from and the boundaries of the data available to them.

Market-research tests point in the same direction, while deserving a different evidentiary label than peer-reviewed work. Verasight's January 2026 industry report found that synthetic samples did worse on brand-awareness and product-testing questions than on frequently asked political questions, and warned that subgroup errors can be much larger than topline errors. That result is intuitively important: models have more public text about elections than about why a particular household buys one coffee brand, rejects a package design, delays a purchase, or changes a habit after a bad week.

The Variance Problem

The deep failure mode is not only wrong averages. It is fake coherence.

Human publics are messy. People contradict themselves. They answer differently when tired, threatened, hopeful, bored, rushed, embarrassed, or trying to be polite. They hold views that do not fit their demographic profile. They misunderstand questions. They refuse categories. They change their minds. They care intensely about some issues and barely at all about others. They know things a model cannot infer from public text because the knowledge lives in bodies, households, jobs, illnesses, debts, local institutions, memories, and silence.

A synthetic respondent has no rent due, no boss, no church, no commute, no neighborhood, no family obligation, no fear of a doctor's bill, no loyalty to a person who disappointed them, no private embarrassment, no weather, no waiting room, no embodied stake. It has a prompt, a distribution, and a style of plausible answer.

That is why low variance matters politically. If a synthetic public is too internally consistent, it can make social life appear more legible than it is. It can tell a campaign that voters are cleaner ideological types than they are. It can tell a company that consumers reason more consistently than they do. It can tell a public agency that a subgroup is easier to model than to contact.

A larger synthetic sample does not solve that. Generating 50,000 model respondents may reduce random noise inside the model's own distribution, but it does not repair systematic bias in the representation. It can produce a more precise version of the wrong public.

This is also a source-discipline problem. If a report does not preserve model version, prompt, sampling method, grounding data, demographic conditioning, temperature or decoding settings, validation baseline, and human-review record, the synthetic public cannot be audited. It becomes a clean-looking table detached from the machinery that produced it. AAPOR's report adds a practical reason for this discipline: models can be retrained, updated, or realigned without notice, so conclusions about synthetic-response quality can change with time even when the research label stays the same.

Fraud and Contamination

The second problem is not researchers intentionally replacing humans. It is humans, bots, or organized actors using AI to pollute surveys that are supposed to contain human responses.

Pew distinguishes probability-based panels from opt-in samples for exactly this reason. In a probability panel, people are selected from a real-world sampling frame and cannot simply nominate themselves into unlimited surveys. In open opt-in systems, bad actors can create fake identities, chase incentives, and use AI to answer many surveys quickly.

Sean Westwood's 2025 PNAS paper makes the contamination problem concrete. It describes an autonomous synthetic respondent designed to read surveys, generate coherent answers, and undermine the old assumption that a plausible response is evidence of a human respondent. The point is not that every online panel is already broken in the same way. The point is that survey integrity now depends on adversarial data-quality controls, not only good-faith questionnaire design.

This is not only a market-research nuisance. Public opinion surveys inform journalism, campaigns, academic studies, public agencies, philanthropy, product design, health communication, and institutional strategy. If the response layer becomes machine-contaminated, downstream institutions may govern a public that was partly fabricated.

The detection problem will not stay still. Attention checks, open-ended questions, style filters, browser paradata, identity checks, and response-time analysis can help, but AI systems adapt. A model can produce plausible open-ended responses, vary tone, preserve a persona, and avoid obvious bot markers. The old assumption that a coherent paragraph implies a sincere human respondent has weakened.

Survey integrity therefore becomes part of AI governance. It is no longer only a methodological issue for specialists. It is part of the infrastructure that tells institutions what the public is. The hardening has to be proportional: identity checks, device signals, behavioral review, and fraud scoring can protect data quality, but they can also create privacy risks or exclude people who are already hard to reach.

This is the survey-side version of the problem raised by public-comment bots and bot disclosure. Institutions need to know when a voice is generated, assisted, duplicated, paid for, or actually attached to a reachable person. Otherwise the interface can count fluency as presence.

The privacy tradeoff is real. Proof-of-person methods, device checks, behavioral telemetry, and panel validation can defend research, but they can also expand surveillance of respondents. Survey hardening should follow data minimization: collect enough evidence to protect the sample, avoid turning participation into a biometric or identity dragnet, and publish aggregate data-quality methods without exposing individual respondents.

The Governance Standard

A serious standard should begin with a bright line: synthetic respondents are not respondents. They are model outputs about possible respondents.

First, disclose synthetic use plainly. Any report using AI-generated survey responses should say so in the headline methodology, not in a technical appendix. It should distinguish human data, imputed data, simulated respondents, model-assisted coding, and AI-generated analysis.

Second, forbid substitution in democratic claims. Political polling, public consultation, community needs assessment, civil-rights impact analysis, worker voice, patient voice, student voice, and affected-community review should not be replaced by silicon samples. A model can help prepare the work. It should not stand in for the people whose consent, needs, and power are at stake.

Third, validate against human ground truth. Synthetic methods should be benchmarked against real survey data for the same domain, population, language, time period, and question type. General claims that a model can simulate people are not enough.

Fourth, report variance and subgroup error. Topline accuracy can hide the very failures that matter for governance. Reports should show dispersion, subgroup performance, sensitivity to prompts, model version, temperature, sampling procedure, and whether results changed over time.

Fifth, keep an audit trail. Reports should preserve the model, prompt, data sources, conditioning variables, decoding settings, date of generation, filtering process, human review, validation results, and known limits. The public should not have to trust a black-box "representative AI sample."

Sixth, protect source respondents. If real survey answers, interviews, panel profiles, or administrative records are used to condition, benchmark, or generate synthetic respondents, the source humans still deserve consent discipline, privacy review, retention limits, and clear restrictions on reuse. A synthetic layer should not launder sensitive human data into an easier-to-share artifact.

Seventh, preserve probability-based human research where legitimacy matters. If a decision claims democratic warrant, it needs contact with real people selected through defensible methods. Convenience is not legitimacy.

Eighth, harden survey infrastructure against AI contamination. Opt-in surveys need stronger identity, recruitment, throttling, behavioral review, fraud detection, and transparent data-quality reporting. Probability panels are not invulnerable, but they begin from a stronger sampling frame. Anti-fraud systems should be privacy-conscious and should be tested for exclusionary effects.

Ninth, protect qualitative surprise. The point of talking to people is not only to get answers to known questions. It is to discover what the institution did not know how to ask. Synthetic respondents are weakest where lived experience, contradiction, and unexpected salience matter most.

Tenth, stop generated publics from becoming training evidence without labels. Synthetic responses used for pilots, augmentation, or simulation should carry provenance when stored or reused, so later models and reports do not absorb generated opinion as if it were fresh human testimony.

Eleventh, separate simulation from consultation. Procurement decks, agency consultations, community assessments, product research, and public reports should not describe a synthetic panel as a consulted public. If simulation informed the work, say simulation. If people were contacted, say who, how, when, under what consent, and with what limits.

Twelfth, require review for consequential uses. Synthetic respondent methods used in public services, health, education, employment, housing, civil rights, elections, or public consultation should trigger human oversight, impact assessment, and source documentation. The relevant governance question is not only whether the model is accurate, but whether the institution had authority to replace listening with simulation.

What This Changes

A synthetic respondent is a mirror pretending to be a witness.

It reflects patterns in language, data, demographic association, and model training. It may be useful. It may even be surprisingly accurate in some narrow cases. But it does not testify. It does not risk anything by answering. It does not have to live under the policy, buy the product, endure the workplace, trust the hospital, send the child to the school, or absorb the consequences of being misunderstood.

The governance danger is recursive. A model is trained on traces of human society. An institution asks the model what humans think. The institution acts on that answer. Humans adapt to the institution. Future traces record the adapted world. The model then appears to have known the public it helped create.

That is synthetic consensus with a spreadsheet. It can look empirical while quietly removing the people from the measurement loop.

The answer is not to ban simulation. Simulation is useful when it is labeled, bounded, validated, and kept subordinate to contact with reality. That is the same public-interest technology standard that applies to other civic systems: the tool must remain answerable to the people it describes. The answer is to preserve the human voice layer where voice is the point.

Polling, interviewing, testimony, user research, fieldwork, worker consultation, patient engagement, and democratic hearing are not merely data-extraction techniques. They are institutional acts of recognition. They say: you are not only represented by a pattern. You may answer for yourself.

When a system replaces that act with generated personas, it has not made research efficient. It has changed who counts as present.

Source Discipline

The minimum discipline is to keep four things separate in the record: a poll of contacted people, an imputation from human survey data, a simulation by generated respondents, and contamination by automated or bad-faith survey-takers. They may all produce rows in a dataset. They do not have the same evidentiary status.

Claims about scientific capability should rest on peer-reviewed or clearly labeled preprint research, with model version, prompt design, validation data, subgroup error, variance, and time sensitivity preserved. Claims about professional duties should rest on professional standards and survey-methodology guidance. Industry reports can be useful operational evidence, especially when they share methods and comparisons, but they should not be described as neutral proof that a synthetic public has been validated.

A current-source claim needs a current-source date. A May 2026 professional report, a revised preprint, a model-version result, or an industry benchmark can change quickly. This page treats AAPOR and ICC/ESOMAR as professional-standard sources, Pew as an institutional-methods source, peer-reviewed articles as method evidence under their tested conditions, arXiv as preprint evidence, PNAS as a peer-reviewed warning about AI contamination, and Verasight as an industry test with useful but non-neutral operational evidence.

Internally, this page belongs beside synthetic publics, consent for synthetic people, public-comment bots, synthetic consensus, claim hygiene, provenance, human oversight, and algorithmic impact assessments. The same rule runs through all of them: generated social evidence needs labels, source trails, and human accountability before institutions use it to speak for people.

Sources


Return to Blog