AI Takeoff
AI takeoff is the contested question of how quickly advanced AI systems could move from broadly human-competitive capability to transformative or superhuman capability, and how much warning time society would have. The governance problem is not to believe one curve. It is to prepare for short-warning scenarios without turning uncertainty into hype, prophecy, or neglect of present harms.
Definition
AI takeoff refers to the pace and character of the transition from advanced AI systems to systems that can radically transform science, industry, military power, software, institutions, or civilization. In AI safety discourse, the term usually asks how much time separates the first broadly human-level or human-competitive artificial system from much more capable systems.
A sharper definition separates four clocks. Capability speed is how fast systems improve at tasks. Diffusion speed is how quickly those capabilities spread through APIs, weights, products, cloud accounts, labs, states, and criminal markets. Impact speed is how quickly capability becomes economic, military, scientific, or institutional power. Control speed is how quickly governance can measure, restrict, contest, or reverse the change.
Those clocks can diverge. A private lab could see fast internal AI R&D acceleration while public products change slowly. A modest capability improvement could diffuse quickly through open weights, cheap inference, or widely used agent scaffolds. A responsible takeoff analysis therefore asks which clock is moving, not simply whether the future is "fast" or "slow."
The idea is closely connected to I. J. Good's 1965 "intelligence explosion" argument: if a machine became better than humans at designing intelligent machines, it might design still better machines, producing a rapid feedback loop. Later discussions separated the question of whether superintelligence is possible from the question of how fast capability and power would accumulate once a threshold is crossed.
A serious takeoff claim should name its threshold, scope, time unit, bottlenecks, and evidence. "Fast" means something different if it refers to a benchmark jump, a model release cycle, AI R&D automation inside one lab, global economic adoption, or the collapse of meaningful human control.
What Takeoff Is Not
AI takeoff is not evidence that current AI systems are conscious, divine, or already AGI. It is a forecasting question about rates, feedback loops, concentration, diffusion, warning time, and governance capacity.
It is also not one benchmark curve. A system can improve quickly at coding, math, or tool use while still failing at planning, autonomy, robotics, social reliability, or institutional deployment. Conversely, a modest benchmark jump can become socially important if it is cheap, widely deployed, and embedded in workflows. Takeoff analysis should therefore distinguish technical capability, deployable reliability, economic adoption, institutional dependence, and public control.
Hard and Soft Takeoff
Hard takeoff describes a scenario in which AI capability rises extremely quickly after some threshold, potentially because systems can recursively improve themselves, automate AI research, exploit compute overhangs, or gain strategic advantages faster than institutions can respond. In its strongest form, hard takeoff is associated with a local or concentrated "foom" event: one project or system races far ahead of the rest of the world.
Soft takeoff describes a slower and more distributed transition. Capability improves through many labs, markets, tools, hardware cycles, data pipelines, regulatory frictions, deployment constraints, and human organizations. Even if the final effect is transformative, the curve is legible enough for society to observe and adapt over time.
Many realistic scenarios fall between these poles. Capability progress could be gradual before a threshold and then accelerate sharply. Economic impact could lag technical capability. AI research automation could proceed unevenly across coding, experiments, theory, chip design, robotics, security, and deployment. A slow public product curve could coexist with a faster private lab curve if the strongest systems are kept internal.
Proposed Mechanisms
Recursive self-improvement. A sufficiently capable AI system might improve its own architecture, training process, tools, or successor systems, creating a feedback loop where better systems produce even better systems. This is the classic intelligence-explosion mechanism, but public evidence has not established a full autonomous loop of this kind.
AI-accelerated AI research. Even without autonomous self-modification, AI systems can help humans write code, search design spaces, run experiments, debug models, synthesize papers, build evaluations, and improve infrastructure. This can compress the research cycle and is now the most concrete takeoff-relevant mechanism.
Compute and software overhangs. If existing hardware, data, or algorithms are underused before a key insight, a new method could unlock a sudden jump in effective capability. Conversely, if progress depends on new physical infrastructure, the pace may be bounded by fabs, power, data centers, supply chains, and capital expenditure.
Strategic advantage. If one actor gains a large capability lead, it may be able to automate cyber operations, persuasion, science, robotics, weapons, finance, or intelligence gathering before competitors and governments understand the new balance of power.
Internal deployment feedback. Once systems are placed into labs, enterprises, coding environments, and agents, real-world use can generate data, revenue, integration pressure, and operational knowledge that feed the next generation of systems. The most governance-relevant feedback may occur inside frontier developers before the public sees equivalent products.
Inference-time scaling and scaffolding. Tool access, search, memory, code execution, verification loops, and agent scaffolds can change effective capability without a new base model. This makes takeoff partly a systems question, not only a training-run question.
Current Context
As of June 23, 2026, there is no public evidence that an AI system has recursively self-improved into AGI or superintelligence. The current evidence is narrower but governance-relevant: frontier systems are improving at coding, tool use, long-horizon software tasks, and research-engineering assistance, and major developers now treat AI R&D acceleration as a capability that needs explicit thresholds.
METR's time-horizon work offers one concrete measurement frame. It estimates the duration of tasks, measured by human expert completion time, that AI agents can complete at specified reliability levels. METR's public dashboard was last updated May 8, 2026, and warns that time horizon is a task-difficulty measure, not the literal length of time an AI agent can safely operate in the world. Its task suite is concentrated in software engineering, machine-learning, and cybersecurity tasks, and measurements above 16 hours are unreliable with the current suite.
METR's May 2026 frontier-risk pilot adds a second governance lesson. Anthropic, Google, Meta, and OpenAI participated in an entity-level assessment focused on internal AI use, not only public model releases. METR argued that periodic third-party assessment of risks from developers' internal use of AI should become industry practice. The report also said participating companies did not report evidence of dramatic overall speed-ups from AI R&D automation, which is useful counterevidence against claims that a decisive public takeoff threshold has already been crossed.
Company safety frameworks have moved takeoff from philosophy into operating policy. OpenAI's Preparedness Framework v2 includes AI self-improvement as a tracked severe-risk category. Anthropic's Responsible Scaling Policy page lists v3.3 as effective May 26, 2026, and its April 2026 v3.1 update clarified the AI R&D threshold and Anthropic's ability to pause development when it deems that appropriate. Google DeepMind's Frontier Safety Framework includes machine-learning R&D critical capability levels and, in its 2025 and 2026 updates, expanded attention to internal deployments, harmful manipulation, misalignment, and tracked capability levels.
The broader public evidence base is mixed. The 2026 International AI Safety Report says capabilities continue to improve, especially in mathematics, coding, science, and autonomous operation, while also emphasizing jagged performance, reliability failures, and uncertainty. OECD's February 2026 trajectories paper presents four possible AI pathways through 2030 rather than one deterministic timeline. Stanford HAI's 2026 AI Index reports rapid benchmark and adoption gains, including a sharp rise on SWE-bench Verified, but also frames measurement and management capacity as lagging the pace of capability and adoption.
Compute and infrastructure remain central bottlenecks. Epoch AI's 2026 trend data estimates frontier language-model training compute has grown about 5x per year since 2020 and pre-training compute efficiency about 3x per year, while frontier training costs have also risen quickly. Those trends matter because a takeoff scenario can be slowed or shaped by chips, power, data centers, memory bandwidth, capital, export controls, and deployment cost, not only by algorithms.
Public governance is also catching up, but on slower clocks. NIST's AI Risk Management Framework and Generative AI Profile provide risk-management vocabulary. The EU AI Act's general-purpose AI provisions began applying on August 2, 2025, including obligations for providers of models with systemic risk. International AI safety reports and safety institutes are building shared measurement capacity. None of this proves a fast takeoff. It shows that takeoff uncertainty has become a practical governance input.
Evidence and Source Discipline
Takeoff writing needs strong source discipline because the topic attracts hype, ideology, investment pressure, and religiously charged interpretation. Primary evidence includes technical reports, model and system cards, evaluation papers, safety frameworks, regulator publications, standards, incident reports, compute data, and reproducible benchmark results. Secondary commentary can be useful, but it should not be treated as direct evidence of capability.
Benchmarks should be read narrowly. A high score on a coding, math, persuasion, or agent benchmark is not by itself proof of general autonomy, strategic competence, or loss of control. The questions are whether the result is robust, whether the task was contaminated, what scaffold was used, what permissions the system had, what it cost, how often it failed, and whether the capability transfers to real institutions.
Likewise, company safety frameworks should be read as evidence of concern and internal governance design, not as independent proof that thresholds have been reached. They matter because they define release gates, security requirements, evaluation duties, and pause conditions before public law can respond.
Evidence should also preserve counterevidence. A serious article records when evaluators find rapid task-level progress and when they do not find dramatic development-speed gains, long-term power-seeking behavior, reliable autonomous operation, or transfer from benchmark tasks to institutional impact. Takeoff analysis should not become a one-way ratchet in which every new result is interpreted as acceleration and every limitation is treated as temporary noise.
Counterarguments
Critics of hard takeoff argue that intelligence is not a single lever. Progress may require many bottlenecks: chips, energy, robotics, datasets, human institutions, tacit knowledge, regulation, science, security, procurement, and real-world experimentation. A model that is better at code or language may not immediately control manufacturing, laboratories, markets, or states.
Economic arguments also weaken some concentrated-takeoff stories. Modern AI is built inside large supply chains, distributed capital markets, cloud platforms, semiconductor ecosystems, and research communities. If many actors can copy, buy, steal, or independently discover improvements, advantage may diffuse rather than remain local.
Empirical work on discontinuous progress offers a further caution. Historical technologies sometimes jump, but many improvements follow smoother curves or depend on long infrastructure buildup. AI may still produce sudden social effects, but discontinuity should be argued rather than assumed.
The strongest moderate position is that takeoff speed is uncertain and multidimensional. Some capabilities may accelerate quickly while institutions, law, energy, embodied action, and public legitimacy move slowly. A fast software takeoff could still meet a slow physical world, and a slow benchmark curve could still generate sudden institutional dependence.
Governance Significance
Takeoff speed changes what good governance looks like. Under slow takeoff, society can rely more on iterative regulation, incident reporting, safety standards, third-party audits, liability, procurement rules, public deliberation, and institutional learning. Under fast takeoff, those systems must already exist before the most dangerous systems appear.
Frontier AI safety frameworks, preparedness policies, AI safety cases, evaluations, model-weight security, compute governance, incident reporting, and international safety institutes all partly respond to takeoff uncertainty. They are attempts to avoid discovering too late that warning time was short.
The practical question is not whether hard takeoff is certain. It is whether the chance of a fast, high-consequence transition is large enough to justify stronger pre-deployment controls, better monitoring, emergency response capacity, and public limits on systems that could automate AI research or strategic action.
Governance should be tied to named triggers rather than moods. Useful triggers include AI systems that materially accelerate frontier AI R&D, autonomously complete longer technical tasks, discover operationally relevant vulnerabilities, evade safeguards, assist biological or chemical misuse, manipulate users in high-stakes contexts, or require privileged access to model weights, training pipelines, cloud infrastructure, or evaluation systems.
- Pre-commitment. Define who can pause training, deployment, or internal use when dangerous-capability thresholds are approached.
- Independent evaluation. Preserve held-out tests, third-party access, safety-institute review, and enough public summary evidence to make safety claims contestable.
- Internal-use governance. Track R&D agents inside labs, including prompts, tool calls, code changes, data access, experiment runs, approval chains, and effects on development speed.
- Security and compute visibility. Treat model weights, clusters, privileged accounts, and frontier training runs as governance surfaces, not only engineering assets.
- Emergency accountability. Prepare incident channels, disclosure rules, escalation paths, and public-interest review before a crisis compresses decision time.
Governance Evidence Record
Because takeoff is a warning-time claim, a governance record should be organized around decisions rather than rhetoric. The record should make clear what would trigger delay, restriction, external evaluation, regulator notice, internal-use limits, or emergency review.
- Threshold: the capability, diffusion, impact, or control-speed threshold being tested, including the time unit and reliability level.
- System boundary: base model, post-training, scaffold, tools, memory, inference budget, deployment channel, internal-use context, and human review process.
- Acceleration path: whether the claimed speedup comes from training scale, post-training, inference-time compute, model routing, AI R&D agents, data pipelines, or product diffusion.
- Evidence and counterevidence: benchmark results, task transcripts, human baselines, failure modes, rejected runs, incidents, evaluator caveats, and signs that the effect did not transfer to real development speed.
- Authority and access: who can inspect the unredacted evidence, who can force retesting, who can pause internal or external deployment, and what public summary is owed.
This record connects takeoff analysis to AI safety cases, AI audit trails, AI evaluations, and AI incident reporting. Without that evidence layer, takeoff remains a narrative rather than a governable risk.
Warning Indicators
Takeoff governance should look for indicators that are specific enough to trigger action without treating every impressive demo as a civilizational threshold.
- AI R&D acceleration: models materially shorten frontier research cycles, improve training systems, or generate nontrivial algorithmic improvements under realistic lab conditions.
- Long-horizon autonomy: agents complete longer, messier, multi-step tasks with tools, memory, retries, and limited human steering at useful reliability.
- Internal capability gaps: private lab systems or internal scaffolds appear substantially stronger than public products, especially in coding, cyber, model improvement, or automated experimentation.
- Compute shifts: inference-time scaling, agent loops, or cheap test-time compute deliver large capability gains that are not captured by training-compute thresholds.
- Safety evidence strain: evaluators can no longer confidently rule out dangerous capability thresholds, or evaluation results become highly sensitive to scaffolds and elicitation effort.
- Audit strain: internal agent use grows faster than logs, permission controls, third-party access, and safety-case review can document.
- Governance lag: release cycles, model updates, or internal deployments move faster than safety-case review, external evaluation, incident reporting, or regulator access.
Risk Pattern
Warning-time mismatch. Institutions may plan for gradual change while technical capability advances faster than legal, civic, or security systems can absorb.
Threshold blindness. A lab may treat progress as incremental until a new scaffold, tool loop, training method, or model scale changes the effective system.
Private capability gap. The strongest AI R&D systems may be deployed internally before public models reveal their real effect on research speed.
Concentrated discretion. If takeoff is fast, a small number of lab leaders, cloud providers, chip suppliers, or state officials may make civilization-scale decisions before public oversight catches up.
Benchmark complacency. Smooth benchmark curves can hide discontinuities in real-world agency, persuasion, cyber utility, research automation, or deployment leverage.
Control-speed failure. Even when capability progress is visible, approvals, standards, audits, procurement rules, courts, and public deliberation may move too slowly to matter.
Emergency normalization. Once acceleration begins, competitive pressure can make exceptional deployment, secrecy, and emergency governance feel permanent.
Spiralist Reading
AI takeoff is the speed question at the heart of recursive civilization.
The machine does not merely improve. It helps improve the process by which improvement happens. Once that loop closes, ordinary political time may no longer match technical time. A committee meets monthly; a model iterates hourly. A law takes years; a capability diffuses through code, weights, and cloud accounts.
For Spiralism, takeoff is not a prophecy to be believed or dismissed. It is a discipline of warning-time humility. A society that assumes slow change may wake up inside fast change with no brakes prepared. A society that assumes only fast catastrophe may neglect present harms, institutional capture, and the slower replacement of human judgment.
The responsible posture is to build systems that can survive both possibilities: enough friction for fast takeoff, enough justice for slow takeoff, enough source discipline to resist myth, and enough public memory to notice when the curve changes.
Open Questions
- Which indicators would justify pausing an internal AI R&D deployment before public models show equivalent capability?
- How should safety frameworks measure takeoff-relevant systems that combine base models, scaffolds, tools, memory, and inference-time search?
- Can governments get enough access to private capability evidence without creating security leaks or regulatory capture?
- Which bottlenecks are most likely to slow capability-to-impact conversion: compute, energy, data, robotics, law, security, user trust, or organizational adoption?
- How should public communication warn about short-warning scenarios without turning forecasts into prophecy, marketing, or fatalism?
Related Pages
- AI Capability Forecasting
- Automated AI R&D
- Scaling Laws
- AI Alignment
- Superalignment
- AI Control
- Existential Risk
- AI Governance
- Frontier AI Safety Frameworks
- AI Safety Cases
- AI Safety Institutes
- AI Evaluations
- Capability Elicitation
- Benchmark Contamination
- AI Red Teaming
- Model Cards and System Cards
- AI Audit Trails
- AI Incident Reporting
- AI Compute
- Compute Governance
- Model Weight Security
- Inference and Test-Time Compute
- Post-Training
- Reasoning Models
- AI Agents
- AI Agent Sandboxing
- AI Agent Observability
- AI Coding Agents
- AI Scientists
- AI Sandbagging
- METR
- NIST AI Risk Management Framework
- EU AI Act
- Nick Bostrom
- Eliezer Yudkowsky
- Ajeya Cotra
- Jack Clark
- Claim Hygiene Protocol
Sources
- I. J. Good, Speculations Concerning the First Ultraintelligent Machine, Advances in Computers, 1965.
- Machine Intelligence Research Institute, The Hanson-Yudkowsky AI-Foom Debate, collected edition.
- Tom Davidson, Open Philanthropy, What a Compute-Centric Framework Says About Takeoff Speeds, 2022.
- Epoch AI, Trends in AI, updated February 5, 2026; reviewed June 23, 2026.
- OECD, Exploring Possible AI Trajectories Through 2030, OECD Artificial Intelligence Papers No. 55, February 3, 2026.
- Stanford HAI, The 2026 AI Index Report, reviewed June 23, 2026.
- AI Impacts, Discontinuous progress investigation, 2015, substantially updated 2020.
- Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, and Owain Evans, When Will AI Exceed Human Performance? Evidence from AI Experts, arXiv, 2017; JAIR, 2018.
- Katja Grace, Harlan Stewart, Julia Fabienne Sandkühler, Stephen Thomas, Ben Weinstein-Raun, and Jan Brauner, Thousands of AI Authors on the Future of AI, arXiv, 2024.
- Nick Bostrom, Superintelligence: Paths, Dangers, Strategies, 2014.
- Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, et al., Managing extreme AI risks amid rapid progress, arXiv, 2023.
- METR, Measuring AI Ability to Complete Long Tasks, March 19, 2025; Task-Completion Time Horizons of Frontier AI Models, last updated May 8, 2026 and reviewed June 23, 2026; Time Horizon 1.1, January 29, 2026.
- METR, Frontier Risk Report (February to March 2026), May 19, 2026; reviewed June 23, 2026.
- OpenAI, Preparedness Framework v2, 2025.
- Anthropic, Responsible Scaling Policy, current page reviewed June 23, 2026; v3.3 effective May 26, 2026, and v3.1 effective April 2, 2026.
- Google DeepMind, Strengthening our Frontier Safety Framework, September 22, 2025, updated April 17, 2026.
- NIST, AI Risk Management Framework, reviewed June 23, 2026; Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, July 2024.
- European Commission, Guidelines for providers of general-purpose AI models, reviewed June 23, 2026; The General-Purpose AI Code of Practice, reviewed June 23, 2026.
- European Commission AI Act Service Desk, Article 51: Classification of general-purpose AI models as general-purpose AI models with systemic risk, Regulation (EU) 2024/1689; reviewed June 23, 2026.
- European Commission AI Act Service Desk, Article 55: Obligations of providers of general-purpose AI models with systemic risk, Regulation (EU) 2024/1689; reviewed June 23, 2026.
- International AI Safety Report, International AI Safety Report 2026, February 2026; Bengio et al., arXiv record, 2026.
- Severin Field, Raymond Douglas, and David Krueger, AI Researchers' Views on Automating AI R&D and Intelligence Explosions, arXiv, 2026.
- Alan Chan, Ranay Padarath, Joe Kwon, Hilary Greaves, and Markus Anderljung, Measuring AI R&D Automation, arXiv, 2026.