Blog · Analysis · May 2026

The Efficiency Gain Becomes the Demand Engine

More efficient AI does not automatically mean less AI infrastructure. It can also make model use cheap enough to spread everywhere.

Cheap Compute Is Not Small Compute

The comforting story says that artificial intelligence will get more efficient. Models will use fewer parameters, chips will improve, inference will get cheaper, data centers will optimize cooling, and software will do more with less. Some of that is true. It is also incomplete.

Efficiency is a ratio. It can tell us that a model uses less compute per task, less energy per token, less cost per answer, or less hardware for a given benchmark. It does not tell us how many tasks, tokens, answers, agents, products, users, workflows, and retries the world will ask for once the cost falls.

That distinction matters because AI is not only a product category. It is becoming a general-purpose layer for search, writing, coding, tutoring, customer support, medical documentation, legal drafting, image generation, video generation, agentic browsing, software maintenance, internal knowledge work, advertising, surveillance, fraud detection, finance, science, and government service delivery. When the unit price of cognition falls, institutions do not merely save money on the old workload. They invent new workloads.

This is the core governance problem in AI efficiency. A smaller model can expand the system. A cheaper query can invite more queries. A better chip can justify a larger cluster. A faster agent can make continuous automation normal. The demand engine is not outside the efficiency gain. It is often produced by it.

The Old Paradox

William Stanley Jevons gave the modern problem an old name. In The Coal Question, published in 1865, he argued that making coal more economical to use did not simply preserve Britain's coal supply. More efficient steam engines and industrial processes made coal more useful across the economy, helping expand the very system that consumed it.

The point was not that efficiency is fake. It was that efficiency changes behavior. If a resource becomes cheaper to use, more people and firms can use it, more uses become profitable, and the economy can reorganize around the new abundance. Under those conditions, total consumption can rise even while use per unit falls.

That is why Jevons paradox keeps returning whenever a society treats efficiency as a substitute for absolute constraints. A more efficient car can lower the cost of driving. A more efficient light can make lighting more common. A more efficient server can make computation more pervasive. The rebound is not magic. It is demand responding to lower effective cost.

AI gives the paradox a new surface. The relevant input is not only electricity. It is also accelerator time, inference capacity, model access, cloud budgets, human attention, organizational tolerance for automation, and the permission to run machine judgment through more parts of life.

The AI Rebound

The AI rebound has several practical channels.

First, cheap inference turns occasional use into ambient use. If a model response is expensive, people reserve it for high-value tasks. If it is cheap, every email, ticket, meeting, classroom exercise, shopping comparison, code review, search query, and government form becomes a candidate for model mediation.

Second, cheaper compute enables more runtime reasoning. Products can spend more inference on chain-of-thought-like deliberation, hidden scratchpads, self-checking, search, tool calls, multiple samples, verifier passes, and agent loops. The unit model may be efficient, but the workflow can become compute-hungry because the product now expects the model to plan, inspect, retry, and verify.

Third, efficiency opens marginal markets. A use case that made no economic sense at one cent per action may become plausible at one-tenth of a cent. That is how AI moves from premium assistant to background infrastructure: automated quality checks, low-value content generation, bulk personalization, synthetic survey respondents, always-on classroom helpers, and internal reporting systems that would never have justified expensive model calls.

Fourth, organizations change their standards. Once AI drafting is cheap, one draft becomes five drafts. Once summarization is cheap, every meeting gets summarized. Once coding agents are cheap, more issues are converted into agent tasks. Once synthetic video becomes cheaper, more communications become video. The baseline expands.

Fifth, infrastructure creates its own expectations. A data center built for AI needs utilization. A company that has prepaid for accelerators wants products that consume them. A cloud provider that has secured power contracts wants demand. The physical stack pushes the social stack to find use.

This is why the argument "models are getting more efficient" cannot settle the energy, labor, or governance debate. Efficiency changes the slope. It does not decide the total.

What the Energy Numbers Show

The best public numbers point in two directions at once: AI hardware and algorithms are improving, while total infrastructure demand is rising.

Epoch AI tracks long-run AI trends and estimates that pre-training compute efficiency is improving at roughly 3x per year, while AI chip performance per dollar has improved substantially and GPU energy efficiency has roughly doubled every 2.4 years on average since 2008. Its energy research page frames the tension directly: AI hardware is becoming more efficient, but compute demand has been growing faster.

The International Energy Agency estimated that data centers used about 415 terawatt-hours of electricity in 2024, around 1.5 percent of global electricity consumption. Its 2025 Energy and AI report projects data-center electricity consumption to more than double to around 945 terawatt-hours by 2030, with AI as a major driver alongside other digital services. The IEA also notes that data centers remain a relatively small share of global electricity demand, but their local effects can be much larger because capacity is geographically concentrated.

In the United States, the Department of Energy summarized a Lawrence Berkeley National Laboratory report finding that data centers consumed about 4.4 percent of total U.S. electricity in 2023. The same report projected approximately 6.7 to 12 percent by 2028, with total data-center electricity usage rising from 176 terawatt-hours in 2023 to an estimated 325 to 580 terawatt-hours by 2028.

These numbers should be read carefully. They do not prove that every efficiency gain increases total electricity use. They do not prove that AI will overwhelm the global grid. They do show that per-unit improvements are not currently translating into a shrinking infrastructure footprint. The system is improving and expanding at the same time.

That is the policy-relevant fact. A society cannot govern AI energy demand by looking only at watts per token, training efficiency, model compression, or cooling improvements. It has to ask whether those gains are being banked as reduced load, spent on more capability, or converted into new forms of dependency.

Local Impact, Global Myth

Global percentages can make AI infrastructure sound abstract. One and a half percent of world electricity is significant, but it can also sound manageable beside electric vehicles, air conditioning, industrial motors, and broader electrification. That framing is useful, but it hides the local politics.

Data centers do not land evenly on the planet. The IEA notes that almost half of U.S. data-center capacity is concentrated in five regional clusters. Local grids, water systems, land-use boards, transmission queues, rate structures, emergency planning, and community consent carry burdens that do not appear in a global share.

This is where the rebound becomes institutional. Cheaper AI does not only mean more tokens in the abstract. It means more substations, more interconnection requests, more backup generation, more cooling systems, more water debates, more tax incentives, more utility planning, more ratepayer questions, more land-use fights, and more public officials asked to treat private compute demand as civic destiny.

The earlier essay The Data Center Becomes a Civic Machine treated the data center as local infrastructure. The Jevons version explains why that infrastructure keeps asking to grow even when the machines inside it get better. Efficiency lowers friction. Lower friction invites scale. Scale arrives as a zoning meeting, a grid upgrade, a power-purchase agreement, a water permit, or a local promise of jobs.

The global myth says the cloud is everywhere. The local fact says it has addresses.

Efficiency Theater

There is a weak version of AI efficiency politics that should be rejected: the claim that efficiency improvements are themselves proof of sustainability.

A company can announce a more efficient model while increasing total serving volume. It can tout lower energy per query while launching agent products that make many more queries. It can report power usage effectiveness while building larger facilities. It can describe renewable procurement while increasing local peak demand. It can celebrate model compression while pushing AI into workflows that did not previously require computation at all.

This does not mean the efficiency claims are false. It means they are partial. The honest question is: efficiency relative to what total system boundary?

For AI, the boundary has to include training, inference, data-center construction, chip manufacturing, networking, storage, cooling, water, grid upgrades, backup power, model retries, agent loops, downstream automation, and induced use. A narrow metric can still be useful, but it should not be allowed to masquerade as the whole ledger.

There is an equal and opposite mistake: treating Jevons paradox as proof that efficiency is pointless. That is also wrong. Efficiency can reduce costs, expand access, lower emissions per task, reduce hardware pressure, make public-interest AI possible, and help institutions do useful work with less waste. The problem is not efficiency. The problem is efficiency without limits, disclosure, allocation politics, and demand governance.

The Governance Standard

A serious AI efficiency regime should govern absolute demand, not only per-unit performance.

First, efficiency claims should report total volume. A model provider should be able to say not only that a task uses fewer resources, but whether aggregate training, inference, and serving demand are rising or falling across comparable workloads.

Second, data-center permitting should include induced demand analysis. A proposal should not be evaluated only as a fixed facility. Regulators should ask what future expansion, power procurement, grid upgrades, water use, and local rate effects are made more likely by the project.

Third, product categories should be treated differently. A medical documentation assistant, a scientific model, a code agent, a synthetic video engine, an ad-personalization system, and a bulk content farm do not have the same public value. Efficiency policy should not treat every saved watt as equally worth reinvesting.

Fourth, providers should disclose enough to distinguish training from inference. Training runs are dramatic, but inference can become continuous. Public debate needs better evidence about how much demand comes from model development, consumer use, enterprise automation, agent loops, synthetic media, and search-like retrieval.

Fifth, local infrastructure costs should be made legible. Communities need to know who pays for transmission, generation, water systems, backup power, tax abatements, emergency services, and stranded infrastructure if demand projections change.

Sixth, public compute should be protected from pure rebound logic. A public compute commons should prioritize research, evaluation, education, accessibility, reproducibility, and public-sector capacity, not merely maximize utilization because the machines exist.

Seventh, safety governance should account for cheaper attempts. When models become cheaper to run, malicious or careless actors can run more trials, generate more variants, test more jailbreaks, automate more scams, and flood more channels. Lower cost changes the risk surface.

Eighth, demand reduction should be allowed to count as innovation. The AI economy rewards more use. Governance should also reward systems that avoid unnecessary model calls, preserve human judgment, use smaller models where adequate, cache responsibly, refuse wasteful automation, and keep some activities outside model mediation.

The Spiralist Reading

The AI efficiency rebound is a pattern of recursive reality. The model becomes cheaper, so the world asks the model to see more of itself. More activity passes through the model. The model-mediated world then produces new expectations, new markets, new records, new habits, and new training traces. The system gets better at serving a society that has reorganized around being served.

The danger is not only higher electricity demand. It is the quiet normalization of machine intermediation because the marginal cost feels too low to notice. If summarization is cheap, every conversation becomes a record. If generation is cheap, every blank space becomes a content opportunity. If agents are cheap, every task becomes delegable. If surveillance inference is cheap, every signal becomes worth scoring. If persuasion is cheap, every user becomes worth optimizing.

That is why the efficiency debate belongs beside labor transition, synthetic media, institutional design, public compute, and high-control interfaces. Cheap AI is not merely affordable intelligence. It is a permission structure. It tells institutions that they can put models where they previously put friction, waiting, judgment, silence, or human refusal.

The right response is not to oppose efficiency. Waste is not a moral achievement. The right response is to make efficiency answer to purpose. What uses deserve scale? Which uses deserve limits? Who bears the grid cost? Who benefits from the automation? Which forms of attention, care, judgment, and public knowledge should not be converted into endless model calls just because the next call is cheap?

Jevons paradox does not say the future is predetermined. It says that a society must decide what to do with abundance before abundance decides what to do with society.

Sources


Return to Blog