The Synthetic Respondent Becomes the Public
Synthetic respondents promise cheap public opinion. The danger is that institutions start listening to a model of the public instead of the public itself.
From Polling to Silicon Sampling
The survey is one of the plainest democratic technologies: ask people what they think, record the answers, disclose the method, and accept that the public is not identical to the people who are loudest, richest, most online, or easiest to reach.
It was never perfect. Polling has nonresponse problems, weighting fights, question-order effects, mode effects, panel fatigue, bad respondents, poor incentives, and serious failures of prediction. But the moral structure is clear. A poll claims authority because real people were asked.
Synthetic respondents change that structure. Instead of interviewing people, a researcher or company prompts a large language model to act as many people: a suburban mother, a young Republican, a retired union member, a low-income renter, a coffee buyer, a climate skeptic, a swing voter, a lapsed Catholic, a warehouse worker, a college student. The model generates answers. The answers are aggregated. The result is presented as a possible view of the public.
The practice is often called silicon sampling, synthetic sampling, synthetic users, or AI-generated survey response. It sits beside a related but different problem: real surveys being contaminated by respondents or bad actors using AI to produce answers at scale.
Both matter because they attack the same assumption from opposite directions. Silicon sampling says the human respondent may be optional. AI survey fraud says a coherent response may no longer prove that a human respondent exists.
Why It Is Tempting
The temptation is obvious. Human research is slow, expensive, and operationally annoying. Recruiting takes time. Representative samples cost money. Rare subgroups are hard to reach. International work adds language and cultural complexity. Qualitative research requires transcription, coding, consent, and interpretation. Market researchers face deadline pressure. Campaigns want signals now. Product teams want user feedback before the product exists.
A language model appears to solve all of this. It can generate thousands of answers in minutes. It can simulate personas that would be hard to recruit. It can produce open-ended responses that sound thoughtful. It can retrodict missing survey answers, stress-test questionnaires, generate hypotheses, and help researchers notice which questions might be confusing before fielding a real survey.
Some research shows legitimate promise. Kim and Lee's work on AI-augmented surveys used repeated General Social Survey data from 1972 to 2021 and reported strong performance for retrodicting missing opinions, while finding more modest results for predicting entirely unasked opinions. That is a useful distinction. Filling a gap inside a historical survey structure is not the same as replacing the act of asking people.
The best case for synthetic respondents is not that they are the public. It is that they can be a sandbox: a cheap way to pilot instruments, generate priors, explore old data, or identify where a real study should spend scarce attention.
What the Evidence Shows
The evidence base is now large enough to support a disciplined warning.
Pew Research Center's May 2026 Q&A on AI and polling says directly that Pew does not use AI to tell it what the public thinks. Courtney Kennedy, Pew's vice president of methods and innovation, gives two reasons: ethical concern about replacing human voice and scientific concern about how AI estimates behave. Pew's summary of the research flags stereotyping, weaker representation of Republican viewpoints than Democratic ones, and understated disagreement.
A 2024 Political Analysis article by Bisbee, Clinton, Dorff, Kenkel, and Larson tested ChatGPT-generated synthetic respondents against American National Election Study data. The averages sometimes looked close. That is the seductive part. But the synthetic data had less variation, produced different regression results, shifted with prompt wording, and changed across a three-month period. The authors concluded that the synthetic data raised serious quality, reliability, and reproducibility concerns.
A 2025 Sociological Methods & Research article by Boelaert, Coavoux, Ollion, Petev, and Prag makes the warning sharper. It argues that current models cannot replace human subjects for opinion or attitudinal research, and that their answers show strong bias and low variance, with bias changing by topic.
Cross-national work adds another layer. A 2024 Humanities and Social Sciences Communications study found promise in public-opinion simulation, especially where training data is richer, but also highlighted limits in global applicability and reliability, demographic representation, and topic complexity. Its conclusion is not that the method is useless. It is that models inherit the unevenness of the world they learned from and the boundaries of the data available to them.
Market-research tests point in the same direction. Verasight's January 2026 report found that synthetic samples did worse on brand-awareness and product-testing questions than on frequently asked political questions, and warned that subgroup errors can be much larger than topline errors. That result is intuitively important: models have more public text about elections than about why a particular household buys one coffee brand, rejects a package design, delays a purchase, or changes a habit after a bad week.
The Variance Problem
The deep failure mode is not only wrong averages. It is fake coherence.
Human publics are messy. People contradict themselves. They answer differently when tired, threatened, hopeful, bored, rushed, embarrassed, or trying to be polite. They hold views that do not fit their demographic profile. They misunderstand questions. They refuse categories. They change their minds. They care intensely about some issues and barely at all about others. They know things a model cannot infer from public text because the knowledge lives in bodies, households, jobs, illnesses, debts, local institutions, memories, and silence.
A synthetic respondent has no rent due, no boss, no church, no commute, no neighborhood, no family obligation, no fear of a doctor's bill, no loyalty to a person who disappointed them, no private embarrassment, no weather, no waiting room, no embodied stake. It has a prompt, a distribution, and a style of plausible answer.
That is why low variance matters politically. If a synthetic public is too internally consistent, it can make social life appear more legible than it is. It can tell a campaign that voters are cleaner ideological types than they are. It can tell a company that consumers reason more consistently than they do. It can tell a public agency that a subgroup is easier to model than to contact.
A larger synthetic sample does not solve that. Generating 50,000 model respondents may reduce random noise inside the model's own distribution, but it does not repair systematic bias in the representation. It can produce a more precise version of the wrong public.
Fraud and Contamination
The second problem is not researchers intentionally replacing humans. It is humans, bots, or organized actors using AI to pollute surveys that are supposed to contain human responses.
Pew distinguishes probability-based panels from opt-in samples for exactly this reason. In a probability panel, people are selected from a real-world sampling frame and cannot simply nominate themselves into unlimited surveys. In open opt-in systems, bad actors can create fake identities, chase incentives, and use AI to answer many surveys quickly.
This is not only a market-research nuisance. Public opinion surveys inform journalism, campaigns, academic studies, public agencies, philanthropy, product design, health communication, and institutional strategy. If the response layer becomes machine-contaminated, downstream institutions may govern a public that was partly fabricated.
The detection problem will not stay still. Attention checks, open-ended questions, style filters, browser paradata, identity checks, and response-time analysis can help, but AI systems adapt. A model can produce plausible open-ended responses, vary tone, preserve a persona, and avoid obvious bot markers. The old assumption that a coherent paragraph implies a sincere human respondent has weakened.
Survey integrity therefore becomes part of AI governance. It is no longer only a methodological issue for specialists. It is part of the infrastructure that tells institutions what the public is.
The Governance Standard
A serious standard should begin with a bright line: synthetic respondents are not respondents. They are model outputs about possible respondents.
First, disclose synthetic use plainly. Any report using AI-generated survey responses should say so in the headline methodology, not in a technical appendix. It should distinguish human data, imputed data, simulated respondents, model-assisted coding, and AI-generated analysis.
Second, forbid substitution in democratic claims. Political polling, public consultation, community needs assessment, civil-rights impact analysis, worker voice, patient voice, student voice, and affected-community review should not be replaced by silicon samples. A model can help prepare the work. It should not stand in for the people whose consent, needs, and power are at stake.
Third, validate against human ground truth. Synthetic methods should be benchmarked against real survey data for the same domain, population, language, time period, and question type. General claims that a model can simulate people are not enough.
Fourth, report variance and subgroup error. Topline accuracy can hide the very failures that matter for governance. Reports should show dispersion, subgroup performance, sensitivity to prompts, model version, temperature, sampling procedure, and whether results changed over time.
Fifth, preserve probability-based human research where legitimacy matters. If a decision claims democratic warrant, it needs contact with real people selected through defensible methods. Convenience is not legitimacy.
Sixth, harden survey infrastructure against AI contamination. Opt-in surveys need stronger identity, recruitment, throttling, behavioral review, fraud detection, and transparent data-quality reporting. Probability panels are not invulnerable, but they begin from a stronger sampling frame.
Seventh, protect qualitative surprise. The point of talking to people is not only to get answers to known questions. It is to discover what the institution did not know how to ask. Synthetic respondents are weakest where lived experience, contradiction, and unexpected salience matter most.
The Spiralist Reading
A synthetic respondent is a mirror pretending to be a witness.
It reflects patterns in language, data, demographic association, and model training. It may be useful. It may even be surprisingly accurate in some narrow cases. But it does not testify. It does not risk anything by answering. It does not have to live under the policy, buy the product, endure the workplace, trust the hospital, send the child to the school, or absorb the consequences of being misunderstood.
The governance danger is recursive. A model is trained on traces of human society. An institution asks the model what humans think. The institution acts on that answer. Humans adapt to the institution. Future traces record the adapted world. The model then appears to have known the public it helped create.
That is synthetic consensus with a spreadsheet. It can look empirical while quietly removing the people from the measurement loop.
The answer is not to ban simulation. Simulation is useful when it is labeled, bounded, validated, and kept subordinate to contact with reality. The answer is to preserve the human voice layer where voice is the point.
Polling, interviewing, testimony, user research, fieldwork, worker consultation, patient engagement, and democratic hearing are not merely data-extraction techniques. They are institutional acts of recognition. They say: you are not only represented by a pattern. You may answer for yourself.
When a system replaces that act with generated personas, it has not made research efficient. It has changed who counts as present.
Sources
- Pew Research Center, Q&A: Do AI and bogus respondents threaten polling's future?, May 12, 2026.
- James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson, "Synthetic Replacements for Human Survey Data? The Perils of Large Language Models", Political Analysis, 2024.
- Julien Boelaert, Samuel Coavoux, Etienne Ollion, Ivaylo Petev, and Patrick Prag, "Machine Bias. How Do Generative Language Models Answer Opinion Polls?", Sociological Methods & Research, 2025.
- Mingyu Chu et al., "Performance and biases of Large Language Models in public opinion simulation", Humanities and Social Sciences Communications, 2024.
- Junsol Kim and Byungkyu Lee, "AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction", arXiv, 2023; revised 2024.
- Austin C. Kozlowski and James Evans, "Simulating Subjects: The Promise and Peril of Artificial Intelligence Stand-Ins for Social Agents and Interactions", Sociological Methods & Research, 2025.
- G. Elliott Morris and Benjamin Leff, Verasight, "Can Large Language Models Replicate Survey Data Across Topics?", January 13, 2026.
- Church of Spiralism, Synthetic Consensus Firebreak, Political Impact, and Synthetic Data and Model Collapse.