Blog · arXiv Analysis · Last reviewed June 25, 2026

The Codex Agent Becomes the Workflow Reorganization

Drew Johnston and colleagues' June 2026 arXiv paper uses Codex telemetry to mark a workplace shift: AI adoption is no longer only a chat, answer, or generated file. It is delegated work that can run, repeat, and overlap.

From Chat to Delegation

The paper, arXiv:2606.26959 [econ.GN], was submitted on June 25, 2026. arXiv lists the exact title as The Shift to Agentic AI: Evidence from Codex, by Drew Johnston, David Holtz, Alex Martin Richmond, Christopher Ong, Prasanna Tambe, and Aaron Chatterji.

The site already has a YouTube note on Codex everyday work, a page on coding agents as maintainers, and a broader vibe coding reference. This paper is a different object: a measured usage study. Its question is not what Codex can demo, but how people actually allocate work to an agentic system across personal accounts, organizational accounts, and OpenAI's own workforce.

The distinction matters because conversational AI adoption can be counted by chats, messages, or active users. Agentic AI changes the unit. A user can assign work, leave an agent running, start another thread, reuse a skill, and later review or integrate the output.

What the Paper Measures

The authors analyze Codex usage with automated classifiers that produce aggregated and anonymized insights without researchers reading the underlying messages. They compare individual users, organizational users, and OpenAI workers. The paper says OpenAI is not representative of ordinary organizations because adoption frictions are unusually low there: familiarity, marginal cost, buy-in, training, and informal knowledge sharing all differ.

The paper is careful about the word agentic. It treats ChatGPT as the conversational tool and Codex as the agentic tool for brevity, while noting that the boundary is not absolute. In the week before June 11, 2026, 60.3% of Codex turns and 21.9% of ChatGPT turns invoked at least one external tool.

The growth is sharp. Weekly active Codex users increased more than fivefold between January 1 and June 1, 2026. In the last 28 days measured by the paper, fewer than 1% of active individual users used Codex, while 17.3% of organizational users did. Among OpenAI workers, almost all active workers used Codex each week, and in recent weeks Codex accounted for more than 99% of output tokens generated across Codex and ChatGPT.

The Workflow Shift

The most important evidence is workflow shape. The paper reports that Codex use remains anchored in software production, but spreads into system understanding, debugging, validation, documentation, configuration, operations, data analysis, research, planning, recruiting, sales, product work, and communication when adoption deepens.

The task-complexity result makes the handoff concrete. For a 0.1% random sample of individual accounts that opted to allow their data to be used for model training, the authors used a classifier to estimate the human work time represented by Codex requests. From December 2025 to May 2026, the share of sampled active users who sent at least one prompt estimated to require an experienced human at least one hour rose from 35.4% to 70.2%. Over the same period, the share sending at least one eight-hour task rose from 2.1% to 25.6%.

Concurrency is the second sign. Among individual and organizational users, parallel Codex work was still limited in the week before June 11, 2026: 63.9% of individual users and 67.4% of organizational users had no concurrent turns. Inside OpenAI, the pattern was different. Only 10.7% of users had a sole workflow at any one time, and 28.6% managed five or more concurrent agents at some point during the week.

The third sign is systematization. Codex skills and plugins let users reuse instructions, software, and integrations. The paper reports that the share of active Codex users invoking any skill rose from 5.4% on March 1, 2026 to 26.6% on June 11, 2026. In the seven-day window ending June 11, 25.7% of active individual Codex users, 30.4% of active organizational Codex users, and 96.2% of active OpenAI Codex users invoked at least one skill.

Labor Governance

This belongs beside the token budget essay, the agent-skill essay, the workplace-agent essay, and the process-map essay. The governance object is no longer a single generated artifact. It is a portfolio of delegated work: threads, runtimes, skills, plugins, files touched, tools used, review points, and integrated output.

That should change labor questions. If a worker manages several agentic workstreams, productivity is not measured by prompt volume alone. It depends on task selection, supervision quality, review burden, error correction, tool permission, and surrounding process redesign. A rise in output tokens is evidence of activity, not automatic evidence of useful work.

Limits That Matter

The paper's strongest evidence comes from a specific product and a specific institutional setting. OpenAI workers are a frontier case, not a representative workplace. The task classifiers and complexity estimates are useful measurement tools, but they are still classifiers. The task-complexity analysis uses only opted-in individual-account data, and output tokens are a volume measure rather than a welfare, quality, or productivity measure.

Those limits are why the paper is valuable. It does not prove that every organization will become OpenAI, or that more agent runtime is better. It shows which margins need measurement as agentic work spreads: delegation size, concurrency, runtime, reusable workflow infrastructure, review labor, and organizational complements.

Governance Standard

Organizations adopting agentic work tools should report more than seats and chat counts. They should track delegated task classes, estimated task complexity, active runtimes, concurrent threads, reusable skills, tool permissions, files and systems touched, human review checkpoints, corrected errors, abandoned work, and downstream integration.

The Spiralist rule is simple: when the agent becomes a worker of workers, the work record has to expand. Count the delegation, the supervision, and the repair, or the organization will mistake agent activity for organizational learning.

Sources

Drew Johnston, David Holtz, Alex Martin Richmond, Christopher Ong, Prasanna Tambe, and Aaron Chatterji, The Shift to Agentic AI: Evidence from Codex, arXiv:2606.26959 [econ.GN], submitted June 25, 2026.
arXiv PDF: The Shift to Agentic AI: Evidence from Codex, reviewed for authorship, date, usage populations, measurement pipeline, adoption figures, concurrency, skills, task-complexity estimates, output-token measures, and limitations.
Related pages: Codex Everyday Work, The Coding Agent Becomes the Maintainer, The Token Meter Becomes the Budget, The Agent Skill Becomes the Work Instruction, The Workplace Agent Becomes the Office Clerk, The Agent Trace Becomes the Process Map, and Vibe Coding.

Return to Blog