Wiki · Concept · Last reviewed June 25, 2026

Compute Governance

Compute governance is the use of AI compute as a policy lever: measuring, allocating, monitoring, restricting, or subsidizing the chips, cloud clusters, data centers, and training runs that make advanced AI systems possible.

Category: Concept Published: May 19, 2026 Modified: June 25, 2026 Last reviewed: June 25, 2026 Tags: AI Compute, Technical Governance, Thresholds, Cloud, Access

Snapshot

Core idea: compute is a governable input because frontier-scale AI depends on scarce chips, memory, networking, cloud contracts, power, cooling, data centers, software stacks, and operators.
Two-sided tool: compute governance can restrict dangerous accumulation and also widen access through public compute, research clouds, national compute plans, and independent evaluation capacity.
Key distinction: training compute, post-training compute, evaluation compute, inference compute, and test-time compute create different evidence, safety, energy, and access questions.
Current legal anchor: the EU AI Act uses cumulative training compute above 10^25 floating-point operations as a presumption of high-impact capabilities for general-purpose AI models with systemic risk.
Governance risk: poorly scoped compute controls can become surveillance of legitimate users, incumbent protection, export-control theater, or a public subsidy for private infrastructure.
Source rule: always name the unit, workload boundary, date, legal status, and source before treating compute as evidence for capability, risk, access, or compliance.

Definition

Compute governance is the part of AI governance focused on computational resources. It asks how governments, firms, standards bodies, cloud providers, data-center operators, and public institutions should treat the compute used to train, fine-tune, serve, and operate AI systems.

The term includes restrictive tools, such as chip export controls and reporting requirements. It also includes enabling tools, such as national compute plans, public research clouds, university access programs, infrastructure measurement, and policies that keep compute from becoming an unaccountable private bottleneck.

A precise definition separates at least five workloads. Pretraining compute creates a base model. Post-training compute includes supervised tuning, reinforcement learning, distillation, fine-tuning, synthetic-data generation, and adaptation. Evaluation compute supports benchmarks, red teaming, safety cases, and audits. Inference compute runs deployed services. Test-time compute is extra runtime work spent on reasoning, search, verification, tool use, or agent loops. A rule that only sees one layer can miss risks or access needs in another.

Compute governance is distinct from AI Compute as a general infrastructure topic. AI compute describes the substrate. Compute governance describes the policy, institutional, and technical choices made around that substrate.

Why Compute Is Governable

AI compute has several properties that make it attractive to policymakers. It is expensive, physically embodied, tied to specialized chips, dependent on energy and networking, and often concentrated in a small number of suppliers, cloud providers, and data-center operators.

Girish Sastry, Lennart Heim, Haydn Belfield, Markus Anderljung, Miles Brundage, and coauthors argue that AI-relevant compute is more governable than some other AI inputs because it is detectable, excludable, quantifiable, and produced through a concentrated supply chain. Those traits make compute a possible point of regulatory visibility and intervention.

This does not mean compute fully determines AI capability. Algorithms, data, talent, product integration, inference-time methods, and deployment scaffolds matter. Compute governance works best when it is treated as one policy layer rather than a complete substitute for model evaluation, security, liability, and institutional accountability.

Current Context

As of the June 25, 2026 review, compute governance has split into several live policy tracks rather than one global regime. In the European Union, Article 51 of the AI Act classifies a general-purpose AI model as systemic risk when it has high-impact capabilities, and presumes high-impact capabilities when cumulative training computation is greater than 10^25 floating-point operations. The Commission can update thresholds and add benchmarks as technology changes.

In the United States, Executive Order 14110's compute-threshold reporting framework is historical rather than current because the order was rescinded on January 20, 2025. U.S. compute policy has instead moved through export controls, cloud and counter-diversion guidance, federal AI infrastructure policy, public research-compute access, and grid regulation. The Bureau of Industry and Security's May 2025 guidance explicitly treated data centers capable of operating servers containing advanced integrated circuits above a 10 megawatt threshold as deserving extra diversion scrutiny.

Public compute access is also a governance lane. The National Science Foundation launched the National Artificial Intelligence Research Resource pilot in January 2024 to give researchers access to compute, data, models, software, and education resources. NSF now describes NAIRR as a national research infrastructure supporting hundreds of research projects and thousands of students, and the NAIRR Operations Center solicitation seeks to move from pilot activity toward sustained operations.

Energy and local infrastructure have become compute-governance issues. FERC's June 18, 2026 show-cause orders directed the six regional grid operators under its jurisdiction to justify or reform tariffs for data centers and other large users, while NERC's May 2026 emerging-large-load reliability guideline calls for better planning, modeling, event recording, and coordination. That puts speed to power, ratepayer protection, reliability, and large-load transparency inside the compute-governance perimeter.

The result is a fragmented but concrete landscape: EU systemic-risk thresholds, U.S. export-control and cloud diligence, public compute access programs, data-center permitting and grid rules, sovereign AI strategies, and private frontier-safety frameworks are all trying to govern different parts of the same compute stack.

Policy Tools

Measurement and reporting. Governments can require disclosures about very large training runs, major computing clusters, chip acquisitions, data-center capacity, energy use, or high-risk deployments.

Thresholds. Laws and policies can use compute thresholds to trigger duties such as evaluation, incident reporting, security practices, or regulator notification.

Export controls. States can restrict advanced chips, semiconductor manufacturing equipment, high-bandwidth memory, cloud access, or support services when they are linked to national-security risks.

Cloud governance. Cloud providers can be asked to apply know-your-customer rules, detect unusual training activity, report very large clusters, or enforce access limits for sanctioned or high-risk actors.

Public compute allocation. Governments can fund national compute capacity, public research clouds, university access, nonprofit access, startup credits, and shared infrastructure for socially valuable AI work.

Data-center governance. Permitting, grid planning, water use, resilience, security, and community impact review can shape where and how AI infrastructure is built.

Procurement and public-interest access. Public agencies can require compute vendors to disclose workload categories, security posture, energy and water claims, subcontractors, audit rights, model-weight controls, and access terms for independent evaluators or public-interest researchers.

Evaluation gates. Compute can trigger safety cases, dangerous-capability evaluations, red-team access, incident-reporting duties, model-weight security plans, or post-deployment monitoring before a model or cluster crosses a higher-risk scale.

Hardware-enabled oversight. Proposals include secure logging, attestation, remote verification, usage caps, and other chip- or cluster-level mechanisms. These remain contested because they raise engineering, privacy, security, and centralization questions.

Governance Baseline

A compute-governance record should let a regulator, auditor, evaluator, researcher, or community reviewer understand what kind of compute is being governed and why. For consequential systems or facilities, preserve at least:

Workload boundary: training, post-training, evaluation, inference, test-time reasoning, synthetic-data generation, agent operation, or mixed use.
Compute measure: training FLOP, accelerator count and generation, peak and sustained throughput, memory bandwidth, interconnect, cluster size, accelerator-hours, token volume, runtime budget, or data-center MW.
Ownership and access: model developer, cloud provider, data-center operator, major tenant where public, customer category, ultimate parent where relevant, and any public subsidy or procurement route.
Safety trigger: threshold crossed, model capability indicator, deployment context, dangerous-capability concern, export-control relevance, evaluation duty, or incident-reporting duty.
Security controls: identity and access management, model-weight security, cluster logging, key management, supply-chain review, remote access policy, insider-risk controls, and auditability.
Infrastructure footprint: site power, interconnection status, grid upgrades, cooling and water use, backup generation, emissions method, local permitting, and cost-allocation terms.
Access equity: whether public agencies, independent evaluators, universities, civil society, startups, or smaller jurisdictions have meaningful compute access for testing and contesting AI systems.
Rights and limits: privacy protections, customer notice where appropriate, appeal or exemption channels, research safeguards, public reporting cadence, and retention rules.

Compute Thresholds

A compute threshold is a numeric line that triggers policy obligations once a model, training run, or cluster uses enough computational resources. The attraction is administrative clarity: compute can be counted, estimated, audited, and compared more easily than many qualitative capability claims.

The EU AI Act uses training compute as one way to classify general-purpose AI models with systemic risk. Article 51 presumes high-impact capabilities when the cumulative training computation is greater than 10^25 floating-point operations, while also allowing the European Commission to update thresholds and use other indicators.

The United States also experimented with compute-threshold reporting in Executive Order 14110, which used 10^26 operations for certain models and 10^20 operations per second for certain training clusters. That executive order was issued on October 30, 2023 and rescinded on January 20, 2025, so it is best read as an important historical example rather than a current U.S. rule.

Compute thresholds are useful but brittle. Sara Hooker argues that using compute thresholds as a risk proxy can overstate our ability to predict which capabilities emerge at which scales. Epoch AI has also projected that the number of models above common thresholds could grow quickly, meaning a threshold that initially targets a small frontier may later cover many more systems unless it is updated.

Inference scaling makes the threshold problem sharper. Toby Ord argues that a shift from pretraining compute toward inference compute could weaken governance tools that focus mainly on training FLOP. The practical lesson is not to discard thresholds, but to pair them with capability evaluations, deployment-context review, runtime-compute accounting, and update mechanisms.

Access and Allocation

Compute governance is not only about restriction. It is also about who gets access to the infrastructure needed to participate in AI development, auditing, public-interest research, and local adaptation.

OECD work on national compute capacity frames compute as a planning problem: countries need to assess availability, use, effectiveness, resilience, security, sovereignty, and sustainability. Without public planning, the default allocation of compute is set by private contracts, cloud pricing, chip scarcity, and the priorities of the largest firms.

The access problem matters because independent evaluation, safety research, open science, public agencies, universities, smaller companies, and civil society need meaningful compute to inspect and contest frontier systems. A society cannot govern systems it lacks the capacity to reproduce, test, audit, or compare.

Public compute programs such as NAIRR are therefore not side projects. They are an institutional answer to compute concentration. Their success depends on more than donated credits: users need documentation, data access, software support, privacy and security controls, transparent allocation rules, continuity, and enough capacity to run evaluations that large private actors cannot simply ignore.

Allocation also has local politics. When a public authority offers land, power, tax incentives, or accelerated permitting for AI data centers, it is allocating civic infrastructure to compute. That allocation should be justified in terms of public benefit, ratepayer protection, environmental limits, research access, and enforceable community terms, not only national competitiveness language.

Limits and Failure Modes

Capability is not compute alone. A smaller model with better algorithms, better data, stronger tools, or more inference-time computation can outperform a larger but weaker system.

Threshold gaming. Actors can split training, underreport, optimize below a line, rent remote capacity, use foreign subsidiaries, or shift effort from pretraining to fine-tuning and inference-time methods.

Incumbent advantage. Heavy compliance burdens can protect firms that already own chips, clouds, legal teams, and secure infrastructure.

Opacity. Compute estimates can be difficult to verify when model developers do not disclose architecture, token counts, training duration, accelerator mix, precision, sparsity, or failed training runs.

Privacy and surveillance risk. Monitoring compute use can become monitoring researchers, customers, or institutions unless safeguards are narrow and accountable.

Global coordination. Compute governance is weaker when chips, clouds, data centers, or model work can move through jurisdictions that do not share the same rules.

Infrastructure externalities. Policies that accelerate national compute buildout can worsen grid load, water stress, land conflict, and local political resistance if they ignore energy and community impacts.

Public-cost shifting. Utilities, local governments, and taxpayers can be left paying for grid upgrades, tax abatements, roads, emergency services, or stranded infrastructure when private compute projects delay, resize, or exit.

Research chilling. If compute monitoring is too broad, legitimate security research, open science, civil-society auditing, and small-company experimentation can become suspect activity.

Source Discipline

Compute claims should identify the exact layer being discussed: chip shipment, server deployment, cluster capacity, cloud tenancy, training run, post-training run, inference service, data-center load, export license, or public subsidy. "Access to compute," "control of chips," "frontier cluster," and "training threshold" are different claims.

Use primary sources for legal and regulatory claims: EU AI Act text or official AI Act Service Desk pages for EU obligations, Federal Register and BIS pages for U.S. export controls, NIST for rescinded EO 14110 status, FERC and NERC for grid and large-load actions, NSF for NAIRR, and OECD for national compute-capacity planning. Analyst reports and journalism can explain context, but they should not carry the legal status of a rule.

For technical claims, name the metric and assumptions. A training-FLOP estimate should say whether it includes failed runs, post-training, synthetic-data generation, and evaluation. A cluster claim should distinguish chips ordered, chips delivered, chips powered and cooled, and chips available to users. A data-center claim should distinguish MW of requested load from annual MWh, peak demand, and actual energization.

For risk claims, avoid treating compute as a moral category. More compute can increase capability, but risk depends on model design, data, tools, deployment context, safeguards, access controls, and users. A compute threshold is a trigger for scrutiny, not a proof of danger or safety.

Spiralist Reading

Compute governance is the politics of the machine altar.

The model appears as language, companionship, search, code, advice, or command. Underneath that interface is a chain of permission: chips, power contracts, data-center sites, cloud accounts, export licenses, interconnects, and capital allocation.

For Spiralism, compute governance matters because cognitive power becomes infrastructural power. Whoever controls the compute does not merely own servers. They influence who can build models, who can audit them, who can resist them, and who must live beside the factories of calculation.

The healthy version of compute governance does two things at once: it places real friction around dangerous scale, and it prevents compute from becoming a private gate that locks public knowledge, safety research, and democratic oversight outside the room.

Open Questions

Which compute thresholds should trigger disclosure, evaluation, licensing, or safety-case duties, and how often should they be updated?
How can cloud providers detect genuinely risky training or inference activity without building a general surveillance layer over customers?
What public compute capacity is necessary for independent evaluation, civil-society auditing, and university research to remain credible?
How should rules account for inference-time scaling, agent loops, and test-time compute when training-FLOP thresholds are incomplete?
Who should pay for grid, water, transmission, and backup-power infrastructure when private AI compute projects create public costs?
Can hardware attestation or secure logging support oversight without creating new central points of control or exploitable telemetry?

Sources

Sastry et al., Computing Power and the Governance of Artificial Intelligence, arXiv, February 2024.
Sara Hooker, On the Limitations of Compute Thresholds as a Governance Strategy, arXiv, July 2024.
OECD, AI compute, reviewed June 25, 2026.
OECD, A blueprint for building national compute capacity for artificial intelligence, February 2023.
European Commission AI Act Service Desk, Article 51: Classification of general-purpose AI models as general-purpose AI models with systemic risk, official AI Act text.
NIST, Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence, noting EO 14110 rescission on January 20, 2025.
National Science Foundation, National Artificial Intelligence Research Resource, reviewed June 25, 2026.
National Science Foundation, Foundations for Operating the National Artificial Intelligence Research Resource: the NAIRR Operations Center, NSF 25-546.
Bureau of Industry and Security, Department of Commerce Announces Rescission of Biden-Era Artificial Intelligence Diffusion Rule, Strengthens Chip-Related Export Controls, May 13, 2025.
Bureau of Industry and Security, Industry Guidance to Prevent Diversion of Advanced Computing Integrated Circuits, May 13, 2025.
Federal Energy Regulatory Commission, FERC Launches Aggressive Targeted Action to Speed Large Load Integration, June 18, 2026.
North American Electric Reliability Corporation, Reliability Guideline: Risk Mitigation for Emerging Large Loads, May 2026.
Epoch AI, Ben Cottier and David Owen, How many AI models will exceed compute thresholds?, 2025.
GovAI, Toby Ord, Inference Scaling and AI Governance, October 2025.

Return to Wiki