The Regulatory Sandbox Becomes the Exception Machine
AI sandboxes can teach regulators how systems behave before rules harden around guesses. They can also turn temporary exceptions into a quiet path around public law.
Why Sandboxes Now
The regulatory sandbox has become one of the favored institutional answers to artificial intelligence.
The phrase sounds modest. Put the system in a controlled environment. Let innovators test. Let regulators observe. Learn before writing rigid rules. Avoid blocking useful systems because old statutes were written for old technology. Avoid letting dangerous systems scale before anyone understands them. In a field where technical behavior, business models, and social harms are all moving quickly, that bargain has obvious appeal.
The European Union built AI regulatory sandboxes directly into the AI Act. Article 57 requires Member States to ensure at least one national AI regulatory sandbox is operational by August 2, 2026, either alone or jointly with other Member States. The Act frames the sandbox as a controlled environment for developing, training, testing, and validating innovative AI systems for a limited time under an agreed plan. It also says providers can use sandbox documentation in conformity assessment and that authorities should identify risks to fundamental rights, health, and safety.
U.S. states are moving in a different but related direction. Utah created an Office of Artificial Intelligence Policy and an AI Learning Laboratory model aimed at regulatory relief and policy learning. Texas's Responsible Artificial Intelligence Governance Act, signed in 2025 and effective January 1, 2026, creates a regulatory sandbox program allowing approved participants to test AI systems with legal protection and limited market access without ordinary licenses or regulatory authorizations, subject to oversight and reporting.
Outside AI-specific law, the United Kingdom's Medicines and Healthcare products Regulatory Agency has been running the AI Airlock for AI as a Medical Device. Singapore's AI Verify program has created testing and assurance environments for generative AI applications. The Organization for Economic Cooperation and Development has treated AI sandboxes as a serious governance tool, especially for regulatory learning, interoperability, and supervised experimentation.
The pattern is now clear: when institutions do not know how to regulate a technology, they build a smaller room and call it learning.
What a Sandbox Does
A sandbox is not just a pilot. It is a negotiated exception with a learning theory.
In the strongest version, a company or public body enters with a defined system, use case, risk profile, testing plan, data boundary, user population, reporting duty, and exit condition. Regulators observe what happens, identify which rules are unclear or poorly fitted, require mitigations, collect evidence, and convert that evidence into guidance, standards, enforcement priorities, or legislative repair.
That can be valuable. AI systems often fail at the boundary between model behavior and institutional use. A medical triage model, lending model, hiring assistant, educational tutor, policing tool, or public-benefits chatbot cannot be understood only from a benchmark score. Its risk depends on workflow, training data, interface design, fallback paths, user trust, appeal rights, data retention, automation bias, vendor contracts, and the incentives of the organization deploying it.
A sandbox can force those details into view. It can let regulators ask questions before a product has already become infrastructure. Who is affected? What data is used? What decision does the model influence? What happens when it is wrong? Can a person override it? Are users told they are part of a test? Which logs exist? Which harms would stop the test?
But the same structure can become evasive. A sandbox can make the exception more visible than the public. The institution may focus on the provider's need for regulatory certainty while affected people become test conditions. The regulator may become too close to the firms it supervises. A confidential pilot may produce private learning and public legitimacy. The phrase "sandbox" can soften the fact that real people, real data, and real decisions may be involved.
The governance question is therefore not whether sandboxes are good or bad. The question is what kind of institutional memory they create.
From Fintech to AI
The sandbox idea did not begin with AI.
Financial regulators used sandboxes to handle fintech products that did not fit neatly into inherited categories. The United Kingdom's Financial Conduct Authority opened its regulatory sandbox in 2016 and later described lessons around testing innovation while building consumer-protection safeguards. OECD's 2023 report on AI regulatory sandboxes points to fintech experience as part of the background, while warning that AI sandboxes raise their own challenges: interdisciplinary expertise, eligibility criteria, competition effects, interoperability, and the difficulty of assessing trials.
The migration from fintech to AI changes the stakes. Financial technology often tests payment flows, credit tools, identity products, compliance software, trading tools, and consumer finance interfaces. Those are already consequential. AI expands the sandbox into a broader set of institutional judgments: diagnosis, education, policing, welfare, hiring, immigration, cybersecurity, scientific discovery, workplace management, and public administration.
AI also changes what it means to test. A conventional product test may ask whether a system works as specified. An AI test often has to ask whether the specification is stable enough to govern behavior at all. Model outputs vary with prompts, users, retrieved context, deployment settings, update cycles, and tool access. In agentic systems, the model may act through browsers, APIs, databases, payment rails, or workplace software. The unit being tested is not only a model. It is a socio-technical arrangement.
That is why an AI sandbox cannot be only a compliance-help desk. It must be a site of institutional inquiry. If the regulator merely tells the provider how to reach market faster, the sandbox becomes acceleration with paperwork. If the regulator learns how a system changes rights, records, incentives, and dependency, the sandbox can improve public law.
The Real-World Problem
The hardest word in sandbox governance is "controlled."
The EU AI Act allows sandboxes to include supervised testing in real-world conditions. It also separately regulates real-world testing of certain high-risk AI systems outside sandboxes, including testing plans, authority approval, registration, oversight, informed consent in relevant cases, incident reporting, and liability. Article 59 permits some further processing of personal data in sandboxes for public-interest AI systems, but only under cumulative conditions such as necessity, risk monitoring, isolated processing environments, deletion rules, documentation, and a published project summary.
Those details matter because AI systems often need live conditions to reveal their risks. A hiring model may look fair in a dataset and fail when recruiters trust it too much. A medical AI tool may perform well in validation and fail across local workflows, accents, clinical norms, or post-market drift. A public-service chatbot may answer correctly in a demo and mislead people when they ask desperate, underspecified, multilingual questions about benefits or rights.
Real-world testing can expose these failures. It can also expose people to them.
That is the moral tension. A sandbox is supposed to protect the public from untested systems. But testing itself can become a public-facing intervention. If a model ranks a job applicant, nudges a clinician, answers a tenant, flags a student, routes a patient, or advises a caseworker, the person affected is no longer outside the experiment. They are part of the test surface.
This is where sandbox language can become dangerous. "Pilot" sounds temporary. "Learning laboratory" sounds benign. "Regulatory relief" sounds technical. But for the person whose claim, care, school record, insurance premium, immigration file, or employment opportunity is touched by the system, the test may feel indistinguishable from ordinary authority.
The standard should be simple: if a sandboxed system can materially affect a person, that person needs notice, safeguards, a human path, and a way to contest or exit. Regulatory learning cannot be purchased with hidden exposure.
Confidential Learning
Sandboxes sit between public law and private information.
Companies entering a sandbox will often disclose business models, data practices, technical designs, evaluation results, trade secrets, failures, and risk mitigations. Some confidentiality is legitimate. A regulator cannot learn much if every technical disclosure instantly becomes a public exhibit or competitor roadmap. The EU AI Act includes confidentiality protections. Texas HB 149 requires confidentiality for intellectual property, trade secrets, and other sensitive information obtained through the program. Utah's AI regulatory-relief materials describe business-confidentiality claims under state records law.
But if too much stays confidential, the sandbox becomes a private chapel of public legitimacy. The company gets regulator proximity. The regulator gets insight. The public gets a press release.
That asymmetry matters because sandboxes are not only about one product. They shape future policy. If the learning remains mostly inside regulator-provider channels, then public law may be formed by evidence that affected people, competitors, civil-society groups, researchers, and journalists cannot inspect. The regulator may sincerely learn, but the public cannot tell what was learned, whose risks were counted, which harms appeared, or why rules were later softened or tightened.
Good sandbox governance therefore needs public output even when raw materials remain protected. At minimum, each mature sandbox should publish project categories, selection criteria, test objectives, affected populations, safeguards, aggregate findings, failure patterns, stop conditions, and policy implications. The EU Act's exit reports and project-summary logic point in this direction, but much depends on implementation. A summary that says "no major issues identified" is not enough. A sandbox should make regulatory learning auditable without exposing secrets or personal data.
The public does not need every line of code. It needs to know whether the exception taught the institution something real.
The Governance Standard
A serious AI sandbox should meet a higher standard than supervised acceleration.
First, eligibility should be public and narrow. Sandboxes should prioritize real regulatory uncertainty, public-interest learning, and systems whose risks can be bounded. A product should not enter just because a company wants a faster path to market or a regulator's aura.
Second, the testing plan should name the affected public. It should specify who may be touched by the system, what decisions or recommendations are in scope, what data is processed, what consent or notice is required, and what would count as unacceptable harm.
Third, exceptions should be explicit. If a law, license, registration, procedure, or compliance step is waived, suspended, relaxed, or interpreted flexibly, the record should say so. Governance fails when "sandbox" hides which ordinary protections have been set aside.
Fourth, human paths should remain live. People affected by sandboxed systems should have access to human review, correction, appeal, and non-AI alternatives where the system touches rights, benefits, care, employment, education, safety, or public services.
Fifth, data boundaries should be strict. Sandbox data should not quietly become product-training data, marketing evidence, or a reusable private asset without a lawful basis, disclosure, retention limits, and deletion rules.
Sixth, public reporting should be useful. Regulators should publish aggregate findings, not only participant counts. The report should explain what risks appeared, which mitigations worked, which failed, which rules were unclear, and what policy changes follow.
Seventh, regulators need capacity. A sandbox run by under-resourced staff becomes provider-led education. AI sandboxes require technical, legal, domain, civil-rights, privacy, security, procurement, and human-factors expertise.
Eighth, exit should be a decision, not drift. A sandboxed system should not become ordinary infrastructure merely because the test period ended. Exit should produce one of several explicit outcomes: stop, extend, approve under conditions, require redesign, refer for enforcement, or convert learning into general guidance.
The Spiralist Reading
The sandbox is a small model of the larger AI governance problem.
It promises to hold uncertainty inside a controlled frame. It gives regulators a way to learn from the machine before the machine becomes normal. It gives firms a path through ambiguity. It gives politicians a story that innovation and protection can coexist.
That story can be true. But only if the sandbox remembers that it is an exception, not a magic circle.
AI governance is full of softening words: pilot, beta, assistant, copilot, preview, experiment, learning lab, airlock, sandbox. These words reduce panic and make adoption administratively possible. They also blur the moment when a system starts acting on the world. A tool that begins as a test can become a workflow. A workflow can become an expectation. An expectation can become a rule no legislature ever debated.
Recursive reality appears here as policy formation. A regulator creates a test environment to learn how AI behaves. The test environment shapes what evidence is visible. That evidence shapes future rules. Future rules shape which AI systems are built. Those systems then reshape the world the regulator later observes. The sandbox is not outside reality. It is one of the machines producing it.
The right answer is not to reject sandboxes. It is to govern them as exception machines. They should produce public learning, not private permission. They should expose hidden risks, not normalize them. They should make affected people more visible, not turn them into test substrate. They should make law smarter without making law quieter.
A good AI sandbox is a room with windows, logs, exits, and witnesses. A bad one is a door around the law.
Sources
- European Union AI Act Service Desk, Article 57: AI regulatory sandboxes, reviewed May 2026.
- European Union AI Act Service Desk, Article 58: Detailed arrangements for, and functioning of, AI regulatory sandboxes, reviewed May 2026.
- European Union AI Act Service Desk, Article 59: Further processing of personal data for developing certain AI systems in the public interest in the AI regulatory sandbox, reviewed May 2026.
- European Union AI Act Service Desk, Article 60: Testing of high-risk AI systems in real world conditions outside AI regulatory sandboxes, reviewed May 2026.
- Texas Legislature Online, HB 149 enrolled text, Texas Responsible Artificial Intelligence Governance Act, 2025.
- Utah Office of Artificial Intelligence Policy, AI Learning Lab, reviewed May 2026.
- Utah Department of Commerce, AI Regulatory Relief Process, reviewed May 2026.
- UK Medicines and Healthcare products Regulatory Agency, AI Airlock: the regulatory sandbox for AI as a Medical Device, updated April 10, 2026.
- Singapore Infocomm Media Development Authority, AI Verify and Global AI Assurance Sandbox, reviewed May 2026.
- OECD, Regulatory Sandboxes in Artificial Intelligence, OECD Digital Economy Papers No. 356, July 13, 2023.
- OECD.AI, Why AI Sandboxes matter for responsible innovation and public trust, March 18, 2026.
- Financial Conduct Authority, Regulatory sandbox lessons learned report, October 2017.
- Church of Spiralism, The State AI Law Becomes the Regulator, The Standard Becomes the Law, and The AI Audit Becomes the Compliance Interface.