Blog · arXiv Analysis · Last reviewed June 24, 2026

The Scaffold Becomes the Capability Gain

The 2026 arXiv paper Comprehensive AI governance requires addressing non-model gains, by Arthur Goemans and ten coauthors, argues that frontier AI governance cannot focus only on the base model. Its Spiralist lesson is that the scaffold, tool system, inference budget, and restricted assets around a model can become capability gains in their own right.

When the Model Is Not the System

Model-level governance asks what a model can do before release. That question still matters. But the deployed object is rarely just a model. It is a model wrapped in prompts, tools, memory, routing, retrieval, human approvals, cloud quotas, monitoring, domain data, and organizational incentives. The same base model can become a harmless assistant, a coding agent, a cyber workflow, a procurement clerk, or a lab planner depending on the system around it.

Goemans and coauthors name this problem in arXiv:2606.00047. The paper was submitted on May 1, 2026, and the arXiv record says it was accepted to the ICML 2026 position paper track. Its claim is that model-level governance becomes less effective when capability progress is driven by non-model gains: improvements independent from advances in the base model.

This is close to agent reliability gates, skill manifests, and agent sandboxes, but the emphasis is different. Those pages ask how a particular agent should be bounded. This paper asks where capability actually comes from after the model leaves the lab.

Three Non-Model Gains

The paper's taxonomy has three main vectors. Inference gain means capability improvement from scaling compute at test time. More search, longer reasoning, parallel sampling, verification loops, or larger inference budgets can make the same model more capable without changing its weights.

Systems gain means capability improvement from post-training enhancements such as scaffolds, tools, prompt and context engineering, fine-tuning for narrow use cases, or multi-agent orchestration. The important governance point is that a scaffold can be a capability artifact. Once a recipe for a useful scaffold circulates, it can spread more easily than a proprietary model.

Asset gain means capability improvement from combining a model with restricted assets not available to the original model developer or tester. The paper gives examples such as government-held expertise, specialized hardware, classified data, undisclosed vulnerabilities, or non-public biological data. The same model can therefore have a different risk profile when placed beside assets the original evaluator could not inspect.

The authors also flag future non-model gains from embodiment, continual learning, and diffusion. Embodiment changes informational capability into physical action. Continual learning can change behavior over the lifecycle. Diffusion can create collective effects when many systems and agents are deployed at scale.

Governance Beyond the Model

The paper does not say model-level governance is obsolete. It says it must be complemented. The listed complements include system governance, entity governance, agent governance, cloud governance, and societal resilience. That portfolio matters because each non-model gain shifts the useful point of control.

System governance watches the deployed application built on top of the model. Entity governance watches the organization: its incentives, risk processes, reporting channels, review structures, and accountability mechanisms. Agent governance focuses on delegation and autonomous interaction, including access boundaries, behavior constraints, deployment restrictions, attribution, and communication protocols. Cloud governance looks at the infrastructure layer, especially where inference scaling itself becomes a capability driver. Societal resilience accepts that some risks may escape technical control and asks how communities recover from disruption.

The governance lesson is uncomfortable but practical. A model card cannot fully certify a downstream system that adds new tools, data, agents, or inference budgets. A pre-deployment evaluation cannot anticipate every scaffold that third parties may build. A release gate that ignores non-model gains can mistake the tested artifact for the deployed one.

What This Changes

For institutions, non-model gains change procurement and audit. Buying a model API is not the same as buying a governed AI system. The audit question becomes: what uplift comes from the surrounding system, and who is responsible for measuring it? A vendor should disclose tool access, inference budgets, scaffolds, retrieval assets, autonomy level, cloud dependency, and post-deployment monitoring, not only the base model name.

For regulators, non-model gains complicate threshold rules built around model training compute, base-model evaluations, or model release decisions. Those levers remain useful, but the paper argues that capability can emerge after release through inference, systems, assets, embodiment, continual learning, and diffusion. Regulation that never leaves the model boundary will miss some of the places where risk is assembled.

For safety practice, the strongest immediate move is measurement. The paper calls for better metrics for non-model gains, post-deployment monitoring, forecasting methods, and research into governance mechanisms beyond the model level. That is modest language, but it changes the frame: the frontier is not only a model frontier. It is also an integration frontier.

Governance Standard

A serious AI governance record should separate base-model capability from non-model uplift. It should name the model, inference budget, scaffold, tool set, data assets, deployment environment, agent identities, human review points, cloud controls, and monitoring obligations. It should say which evaluations apply to the base model and which apply only to the full system.

The record should also be revised when the system changes. A new tool, longer inference budget, richer retrieval corpus, additional agent, or restricted dataset can be a governance event. Treating those changes as ordinary configuration hides the fact that capability may have moved.

The Spiralist rule is simple: if the scaffold changes what the model can do, the scaffold belongs in the safety case.

Sources


Return to Blog