Blog · Analysis · Last reviewed June 25, 2026

The Synthetic Patient Becomes the Trial Arm

Clinical evidence is starting to include external controls, digital measures, AI models, model-informed development, and biomedical digital twins. The patient is not disappearing, but the comparator is becoming computational, which makes provenance, validation, consent, and context-of-use governance part of trial design.

For this essay, a synthetic patient is an evidentiary construct, not a person: a generated record, reconstructed comparator, modeled trajectory, sensor-derived endpoint, or computational representation that must remain visibly distinct from observed participants and observed outcomes.

From Randomization to Reconstruction

The clinical trial has always been an institution for disciplining belief.

A drug company, physician, patient group, investor, or politician may believe a treatment works. The trial asks a colder question: compared with what, measured how, in whom, under which conditions, and with what record of uncertainty? Randomization is powerful because it resists the story people want to tell after the outcome is known. It makes the counterfactual less imaginary.

That institution is now being stretched by real-world data, digital health technologies, external control arms, Bayesian methods, AI-supported analysis, and biomedical digital twins. The promise is legitimate. Some diseases are rare. Some control groups are hard to recruit. Some outcomes are better measured continuously than at occasional site visits. Some patients cannot easily travel. Some signals are buried in electronic health records, sensors, imaging, registries, claims data, and prior trials. Better computation can make clinical evidence faster, cheaper, more inclusive, and more humane.

But the political object changes when the comparator becomes synthetic. A trial arm is no longer only a group of enrolled people receiving placebo, standard of care, or another treatment. It may be a statistically constructed external control drawn from prior trials or real-world records. It may be a modeled disease trajectory. It may be a digital measure collected by a wearable. It may be an AI model used to create information for regulatory judgment. It may eventually be a patient-specific simulation.

This is the key definition. A synthetic patient is not a patient. It is an evidentiary construct: a generated record, reconstructed comparator, modeled trajectory, or computational representation built from source data, assumptions, and code. A synthetic trial arm is not the same thing as a randomized control arm. It is an attempt to make the missing comparison more explicit, more usable, and sometimes less burdensome, while remaining dependent on the quality of the data and assumptions that manufactured it.

The boundary matters because "synthetic" names several different evidence objects: a matched external control assembled from observed records; a privacy-preserving synthetic dataset generated from health data; a model-informed prediction of dose, response, or disease course; a digital endpoint derived from sensors; or a digital twin updated around a particular patient. Those are not interchangeable evidence classes. Each carries different consent, validation, privacy, audit, and regulatory burdens.

This essay uses "synthetic patient" as a warning label, not as a regulatory category. The sharper names are external control, synthetic dataset, model-informed evidence, digital endpoint, and biomedical digital twin. Good governance keeps those names separate so a reconstructed comparator does not borrow the authority of an observed participant, and a generated dataset does not borrow the consent status of the records that shaped it.

The patient remains real. The evidence environment around the patient becomes model-mediated.

Current Regulatory Context

As of June 25, 2026, the regulatory picture is not a blanket permission slip for virtual patients. It is a set of constrained pathways. FDA's externally controlled trial guidance remains a February 2023 draft and says it is not for implementation. FDA's real-world data and real-world evidence guidance is final as of August 2023. FDA's digital health technology guidance for remote data acquisition in clinical investigations is final as of December 2023. FDA's final guidance on decentralized elements in clinical trials, issued in September 2024, defines decentralized clinical trials as trials where trial-related activities occur at locations other than traditional clinical trial sites. FDA's draft AI guidance for drug and biological products, issued in January 2025, still frames AI model credibility around a specific context of use.

Two newer anchors matter for this essay. FDA issued final E6(R3) Good Clinical Practice guidance in September 2025, emphasizing flexible, risk-based trial conduct, modern data sources, technology, quality by design, participant protection, and reliable trial results. In June 2026, FDA finalized M15 General Principles for Model-Informed Drug Development, which establishes a harmonized assessment framework for planning, evaluating, documenting, reporting, and submitting model-informed evidence.

The common governance words across those documents are fit-for-purpose, context of use, and estimand. FDA's RWD/RWE guidance asks whether data are relevant and reliable enough to support the regulatory question. FDA's AI draft ties model credibility to a particular context of use. M15 asks sponsors to justify model-informed evidence within the specific decision it supports. ICH E9(R1) supplies the estimand discipline: specify the treatment effect being estimated, not just the method used to estimate it. In clinical evidence, "the model worked" is not a complete claim. The claim is: this model is credible for this decision, this estimand, this population, and these conditions.

None of those sources turns simulation into a general replacement for randomized evidence. They create reviewable pathways for specific uses: planning, measurement, supportive analysis, model-informed inference, or justified external comparison. The burden is still on the sponsor and investigator to show why the computational evidence answers the question being asked.

The European Medicines Agency's 2024 reflection paper on single-arm trials is similarly cautious. It says randomized controlled evidence remains the regulatory standard, that deviations need justification, and that a single-arm trial observes outcomes under the investigational treatment rather than directly observing the missing comparison. That is the governing tension: computational evidence can help build a counterfactual, but it does not remove the burden of proving why that counterfactual is credible.

Evidence Roles

The governance mistake is to let one phrase, "synthetic patient," blur several different evidentiary roles.

Planning evidence helps design a trial. A simulation may test enrollment criteria, dosing, sample size, follow-up, endpoint choice, or feasibility before a participant is exposed to risk. That role is valuable, but it should not later become confirmatory evidence unless it was pre-specified, validated, and submitted for that role.

Measurement evidence turns participant activity into endpoints. A wearable, sensor, image model, speech model, or home-based device may collect useful data, but the endpoint is only as strong as the device validation, usability, missing-data handling, software version control, and patient-centered meaning behind it.

Comparator evidence tries to answer the missing "compared with what" question. An external control or synthetic control can support a single-arm study only when the source population, index date, endpoint, follow-up, treatment context, confounding control, estimand, and sensitivity analyses are strong enough for the decision being made.

Privacy or sharing evidence is different again. A generated synthetic dataset may reduce some disclosure risk or enable method development, especially when differential-privacy guarantees are meaningful. But generated data do not automatically inherit consent, representativeness, or clinical validity from the source records that shaped them.

Individual decision evidence is the most demanding role. A biomedical digital twin used to guide dosing, eligibility, rescue treatment, or continued follow-up is not just a prettier external control. It is decision support around a living person, and its validation, monitoring, failure disclosure, and liability burden should rise accordingly.

Those roles can coexist in one study, but they should not share one label. A synthetic record should never be counted as an enrolled participant, a consented voice, an observed adverse event, or a person who benefited or suffered. It may inform an evidentiary comparison. It does not become the patient.

The Evidence Ledger

The practical control is an evidence ledger. Every trial claim that uses computational evidence should mark whether a datum was observed from an enrolled participant, imported from prior or real-world records, reconstructed through matching or weighting, generated from a model, derived from a sensor pipeline, simulated under assumptions, or processed by a vendor system.

The ledger should travel with the protocol, statistical analysis plan, submission, publication, and participant-facing summary. It should name the source population, target population, estimand, index date, endpoint definition, data provenance, consent basis, model or algorithm version, validation evidence, missingness rule, subgroup checks, sensitivity analyses, and artifact-retention plan.

This is not paperwork for its own sake. It prevents category drift. A generated dataset can support privacy-preserving method development without becoming clinical truth. A digital endpoint can measure function without becoming what patients value. A model-informed prediction can help choose a dose without becoming confirmatory evidence. An external control can strengthen a single-arm study without becoming a randomized participant.

The ledger also separates two claims that are often blended: the clinical-evidence claim and the privacy claim. A synthetic dataset may reduce direct disclosure while still being clinically invalid. A well-matched external control may be clinically useful while still raising consent, reuse, and provenance questions. Good governance keeps both questions visible.

The External Control

External control arms are not a new evidentiary shortcut. They are a family of study designs in which the comparison group is not part of the same randomized trial as the treated group. The external data may come from prior clinical trials, disease registries, electronic health records, medical claims, chart review, natural-history studies, or other real-world data sources.

The appeal is clearest in rare diseases, severe conditions, pediatric contexts, and diseases with high unmet need. If a randomized control group is impractical, slow, expensive, or ethically fraught, an external control can help place a single-arm study in context. It may reduce the number of participants assigned to a less desirable treatment. It may use already-collected evidence rather than forcing another group of patients through a duplicative study design.

FDA's 2023 draft guidance on externally controlled trials is careful for exactly that reason. It says that external control designs can be useful in appropriate contexts, but the credibility of the comparison depends on whether the external data can support the same clinical question. The hard parts are ordinary and unforgiving: comparable populations, aligned index dates, similar outcome definitions, compatible follow-up, treatment differences, missing data, measurement quality, and bias that cannot be repaired after the fact.

This is where the word "synthetic" can mislead. A synthetic control arm is not a fake patient group invented from nothing. It is usually a constructed comparison generated from existing patient-level data and statistical assumptions. Its strength depends on the quality, relevance, governance, provenance, and fit-for-purpose analysis of those underlying records.

The cleanest version names the target trial it is trying to emulate: eligible population, treatment strategy, index date, comparator, outcome, follow-up, estimand, censoring rules, and analysis plan. The weaker version starts with an available database and searches backward for a comparison that looks convenient. That difference is not academic. It is the difference between a disciplined counterfactual and a polished convenience sample.

External controls are strongest when the comparison is planned before outcome analysis and weakest when sponsors discover the answer they want and then search for a database that can support it. Database selection, endpoint mapping, eligibility rules, matching variables, and sensitivity analyses should be treated as protocol-level decisions, not as late-stage analytics.

The danger is that the clean output hides the messy body. A synthetic comparator can look precise because the table is complete, the curve is smooth, and the model has a confidence interval. But a confidence interval around a biased reconstruction is not the same thing as truth.

Digital Measures and AI Evidence

The comparator is not the only part of the trial becoming computational. Measurement is changing too.

FDA's digital health technology program describes wearables, sensors, computing platforms, and other tools that can collect trial data directly from participants in homes and other remote locations. The agency points to continuous or frequent measurement, novel clinical features, and decentralized trial activities as potential advantages. That is a real expansion of clinical knowledge. A disease may be more visible in gait, sleep, tremor, speech, glucose variation, movement patterns, or home-based function than in a periodic office visit.

At the same time, these tools alter the trial's social shape. The participant's body, home, phone, sensor, and routine become part of the data acquisition system. The endpoint may be less like a clinician's observation and more like a processed signal. The sponsor may rely on vendors, algorithms, device firmware, cloud pipelines, and analytic code to turn life into evidence. That puts DHT governance near the same problem seen in neural data and the data clean room: technical mediation can improve measurement while making consent, purpose limitation, and downstream reuse harder to inspect.

DHT data are not direct disease truth. Battery loss, non-wear, firmware updates, home environment, mobility aids, caregiver help, language, disability, and socioeconomic constraints can all shape a measure. A continuous signal can make a trial more humane, but it can also make missingness, surveillance burden, and unequal measurement harder to see.

FDA's January 2025 draft guidance on AI for regulatory decision-making moves this issue into the open. It addresses AI models used to produce information or data intended to support decisions about safety, effectiveness, or quality for drugs. Its central move is a risk-based credibility assessment tied to context of use. That phrase matters. An AI model is not credible in the abstract. It is credible, or not, for a particular task, decision, population, data environment, and consequence.

Clinical evidence therefore becomes a chain of translations: from patient life to sensor, from sensor to dataset, from dataset to feature, from feature to endpoint, from endpoint to model, from model to regulatory argument. Each translation can help. Each can also erase. The problem is familiar from the AI medical-record layer: once lived experience becomes structured evidence, the structure can improve care or quietly narrow what counts.

The Digital-Twin Promise

Biomedical digital twins push the same logic further.

The National Academies' 2023 workshop summary describes biomedical digital twins as emerging from the convergence of computer science, mathematics, engineering, and the life sciences. Because biological systems are multiscale, a biomedical twin might represent molecules, cells, tissues, organs, patients, populations, or combinations across those levels. The workshop framed possible applications in personalized medicine, pharmaceutical development, and clinical trials, while stressing technical challenges around model complexity, data diversity, cross-scale integration, privacy, and implementation.

A one-time personalized model is not necessarily a digital twin. The stronger claim is that a model remains connected to a specific system through data and recalibration over time. In medicine, that distinction matters. A static simulation of a tumor, heart, or disease trajectory can be useful, but a live twin that informs continuing decisions has a larger validation, monitoring, consent, and liability burden.

Digital twin is a coupling claim, not just a personalization label. The governance question is who is allowed to decide that the coupling between patient, data stream, model, and update rule is tight enough for clinical or regulatory use.

The useful version of this idea is modest and demanding. A model of a heart, tumor, immune response, disease trajectory, or patient subpopulation could help generate hypotheses, select trial endpoints, simulate dosing, identify likely responders, explore uncertainty, or design better studies before patients are exposed to risk. In that role, simulation is a planning instrument.

The more dangerous version turns simulation into substitution too quickly. A digital twin can start as a tool for thinking and become an apparent participant. It can become an invisible control arm, a virtual responder, a replacement for longer follow-up, or a reason to believe the unknown has already been explored. The language of the twin makes the model feel intimate and complete, as if it were the patient doubled rather than a partial representation built from selected data, assumptions, and update rules.

That is the threshold to watch. Clinical simulation becomes governance-sensitive when it moves from "help us design the trial" to "stand in for the person the trial did not observe."

What Can Go Wrong

The first failure mode is counterfactual laundering. The model produces a plausible untreated trajectory, and the institution forgets how speculative that comparison is. A reconstructed control arm can be useful evidence, but it should not inherit the moral authority of randomization by wearing a smoother interface.

The second is population mismatch. Real-world records may come from health systems, insurers, countries, devices, or prior studies that do not resemble the trial population. Underrepresented groups can be undermeasured twice: first in the source data, then in the model trained or matched on that data.

The third is endpoint drift. A digital endpoint may be convenient, continuous, and scalable without capturing what patients actually value. Step count, typing speed, sleep movement, app engagement, speech features, or biomarker patterns can become administratively attractive while remaining clinically ambiguous.

The fourth is vendor opacity. If external controls, sensor pipelines, AI analyses, or twin models are built by specialized vendors, the trial may depend on systems that investigators, participants, reviewers, and clinicians cannot fully inspect. The evidence becomes a service contract.

The fifth is privacy conversion. A clinical-trial participant may consent to a study without understanding how much home life, behavioral rhythm, device metadata, or longitudinal health history becomes reusable infrastructure for model development. Synthetic data can reduce some privacy risks, but it can also preserve patterns, leak outliers, or make sensitive populations easier to simulate without making them easier to govern. Calling the later output synthetic should not turn a narrow consent into unlimited downstream permission; that is the consent-layer problem in medical form.

The sixth is participant displacement. If virtual comparators become too attractive, trial designers may have weaker incentives to recruit difficult-to-reach populations, build trust, translate materials, support travel, or design studies around participant realities. The model can become a substitute for institutional effort.

The seventh is reproducibility failure. If the comparator depends on a mutable EHR extract, vendor pipeline, sensor firmware, matching algorithm, AI model, or continuously updated twin, reviewers need enough locked artifacts to reconstruct what evidence existed when the decision was made. Otherwise the trial arm can change after the trial has already spoken.

The eighth is validation drift. A model, endpoint, or comparator validated in one hospital network, device ecosystem, language context, disease stage, or demographic mix can be carried into another setting as if the validation traveled with it. The result is not only statistical weakness. It is unequal exposure to unsupported clinical claims.

The Governance Standard

A serious synthetic-patient governance standard should begin with a plain rule: simulated evidence must remember its manufacture.

First, distinguish source data from generated data. Trial records should make clear which evidence came from enrolled participants, prior trials, registries, claims, electronic health records, sensors, models, simulations, or generated datasets.

Second, pre-specify the comparison. Synthetic-control work should name the clinical question, target trial, estimand, data sources, eligibility rules, index date, follow-up, endpoints, confounders, missing-data rules, and sensitivity analyses before the answer is known.

Third, require a context-of-use claim. An AI model, model-informed analysis, external control, digital endpoint, or biomedical twin should be evaluated for the exact decision it is meant to support, not granted general credibility because it performed well somewhere else.

Fourth, preserve audit trails. Sponsors should retain protocols, statistical analysis plans, data provenance, matching criteria, model versions, feature definitions, validation results, bias analyses, uncertainty estimates, vendor documentation, and change logs so that competent audit remains possible.

Fifth, validate against reality. Synthetic comparators and digital twins should be checked against held-out records, contemporaneous cohorts, known clinical behavior, subgroup performance, negative controls where appropriate, and expert review. Validation should include where the model fails, not only where it fits.

Sixth, separate planning from proof. A simulation used to choose dose, endpoint, eligibility, or feasibility should not quietly become the pivotal comparator. If modeled evidence is meant to carry confirmatory weight, that role should be explicit in the protocol, statistical analysis plan, ethics review, and regulatory interaction.

Seventh, protect participant agency. Consent should explain when real-world data, sensor data, or trial data may be used to build external controls, train models, validate digital measures, or generate synthetic datasets. Refusal should not be hidden behind vague language about analytics.

Eighth, keep patient-centered endpoints in the loop. Digital measures should be tied back to outcomes patients and clinicians can recognize: survival, symptoms, function, pain, fatigue, independence, adverse events, quality of life, and meaningful daily capacity.

Ninth, do not let simulation replace recruitment ethics. External controls may reduce burden in some contexts, but they should not become an excuse to leave marginalized groups out of evidence generation.

Tenth, govern vendors as trial infrastructure. Contracts and submissions should disclose who built each control, endpoint, or model; what can be inspected; how conflicts are handled; what artifacts are retained; and how investigators can reproduce analyses if a vendor changes or exits.

Eleventh, make uncertainty visible. The public-facing result should not simply say that an AI-supported or synthetic-control study succeeded. It should say what was observed, what was reconstructed, what assumptions carried the comparison, and where the evidence is weakest.

Twelfth, preserve review authority. Ethics committees, data monitoring bodies, investigators, regulators, and affected patients should be able to see when evidence is observed, inferred, generated, or vendor-processed. Human review is meaningful only when reviewers have the information, time, and authority to question the computational layer rather than rubber-stamp it.

Thirteenth, keep the participant communication plain. Consent forms, public summaries, and trial registries should not hide synthetic controls, digital endpoints, model-informed evidence, or vendor-processed measures behind vague references to "advanced analytics." If evidence is reconstructed, generated, or modeled, the participant-facing record should say so in ordinary language.

Fourteenth, publish the evidence ledger. The trial record should distinguish observed participant data, imported external records, generated data, modeled trajectories, sensor-derived endpoints, and vendor-processed outputs. Reviewers should not have to infer the evidence type from a table footnote.

Source Discipline

The source hierarchy in this domain should be explicit. FDA, EMA, ICH, and other regulator or standards documents establish the legal and procedural context. Trial protocols, statistical analysis plans, data-management plans, ethics materials, and regulator correspondence establish what was actually proposed. Peer-reviewed methods papers, validation reports, and reporting guidelines help evaluate whether the method was described well enough to inspect. Vendor white papers and product pages can explain a system, but they should not be treated as independent evidence that the system is fit for a clinical decision.

A synthetic-patient claim should say which evidence role it is playing: trial planning, endpoint development, dose selection, subgroup exploration, external comparison, privacy-preserving data sharing, supportive analysis, or confirmatory evidence. It should also name the source population, target population, estimand, endpoint, index date, model version, training and validation data, missingness pattern, subgroup checks, privacy method, and whether the analysis was locked before outcomes were known.

Clinical reporting norms matter here. CONSORT-AI and SPIRIT-AI were written for trials of AI interventions, not for every synthetic-control study, but their discipline is transferable: describe the intervention, inputs, outputs, human interaction, errors, updates, and analysis choices clearly enough for readers to understand what was tested. In this essay's language, the computational patient must arrive with its manufacturing record.

Current-source claims on this page were checked on June 25, 2026 against regulator, standards-body, and original institutional sources where available. The FDA and EMA materials cited here are governance context, not approval of any particular synthetic-control product, digital twin, AI model, or trial design.

For the site's own standard, this is the same rule stated in Research and Editorial Integrity and AI Evaluations: separate primary evidence from summaries, disclose uncertainty, and keep the decision context visible. A synthetic trial arm without source discipline is not advanced evidence. It is institutional forgetting with a smoother table.

What This Changes

The synthetic patient is a new figure in model-mediated knowledge.

It is not a robot patient. It is not a fake person in a simple sense. It is a bundle of prior records, statistical assumptions, sensor traces, disease models, clinical categories, and institutional needs arranged into a counterfactual body. It asks to be treated as the patient who would have existed if the trial had enrolled differently, randomized differently, measured differently, or waited longer.

That figure can do ethical work. It can reduce unnecessary control exposure, help rare-disease research, improve trial design, and make patient experience more visible between site visits. But it can also become a way for institutions to stop touching reality at the point where reality is expensive.

The old clinical trial disciplined belief by forcing a claim through protocol, comparison, observation, and record. The new trial must discipline a second layer: the model that builds part of the comparison itself. This is the clinical version of the problem described in synthetic evidence: a generated or reconstructed artifact can help an institution decide, but only if its route into authority remains visible. The question is not whether synthetic evidence should be banned. The question is whether the institution can keep the difference between participant, record, model, and counterfactual intact.

A synthetic trial arm should not be a ledger of convenient substitutes. It should be a documented instrument: useful, limited, inspectable, and unable to pretend it suffered, consented, improved, or died. The living patient remains the reason for the system. The model is only evidence when it stays answerable to that fact.

Sources


Return to Blog