Blog · arXiv Analysis · Last reviewed July 2, 2026

The Agent Autonomy Ladder Becomes the No-Go Zone

Margaret Mitchell, Avijit Ghosh, Alexandra Sasha Luccioni, and Giada Pistilli's paper argues that fully autonomous AI agents should not be developed. The useful move is not a slogan against all automation. It is a control analysis: each step up the autonomy ladder gives the system more room to plan, act, adapt, and compound mistakes before a human can meaningfully intervene.

The Paper

The paper is Fully Autonomous AI Agents Should Not be Developed, arXiv:2502.02649 [cs.AI], by Margaret Mitchell, Avijit Ghosh, Alexandra Sasha Luccioni, and Giada Pistilli. arXiv lists the first version as submitted on February 4, 2025 and the current version as v3, last revised on October 20, 2025.

The paper builds from scientific literature and product marketing to separate agent autonomy levels from marketing language. Its abstract claim is blunt: as users cede more control to an AI agent, risks to people increase, with safety risks becoming especially concerning because they can affect human life and other ethical values.

That makes the paper a good fit for this site. The argument treats autonomy as a transfer of authority, not as a halo around a model. A system that only drafts text has one risk shape. A system that chooses tools, runs multi-step plans, accesses accounts, modifies records, or writes and executes new code has another.

The Autonomy Ladder

The paper defines AI agents as software systems that can create context-specific plans in nondeterministic environments. It then lays out a five-level ladder of agentic control. A simple processor leaves program flow to humans. A router lets the model determine basic program flow. A tool caller lets the model choose functions and arguments. A multi-step agent controls iteration and continuation. The fully autonomous level lets the model create and execute new code, leaving the system itself in control.

The key governance fact is that these are not cosmetic labels. Each rung changes who chooses the next action, which environment can be touched, how many steps can occur before review, and whether a human can still understand the path from request to effect.

For a deployed agent, the autonomy ladder should be visible in the system record. A product card that says "AI agent" without naming whether the model can route, call tools, continue loops, write code, execute code, or trigger other agents is hiding the most important design fact.

Control Is the Unit

The paper's contribution is strongest when it refuses to treat autonomy as a single capability score. The important variable is control: control over program flow, control over action surfaces, control over tools, control over private data, control over the number and timing of steps, control over whether a plan can be revised, and control over whether new code can be introduced into the environment.

This reframes agent governance. A human may remain nominally "in the loop" while losing the practical ability to inspect, halt, or repair an action sequence. Review after a hundred tool calls is not the same as approval before an irreversible action. A dashboard that shows a final success message is not the same as a trace that preserves prompts, observations, tool arguments, intermediate states, failures, retries, and human approvals.

The ladder also explains why agent systems need stronger records than chatbots. The output is not only language. It may be a changed database row, a sent email, a purchased item, a merged pull request, a scheduled payment, a security exception, or a new program running with inherited credentials.

The Risk Stack

The paper maps autonomy against values such as accuracy, assistiveness, efficiency, equity, flexibility, humanlikeness, privacy, safety, security, and trust. Some benefits can increase with autonomy: an agent may do more work, adapt to context, and save time. But the same flexibility expands the harm surface when the model is wrong, over-trusted, misused, or attacked.

The central risk pattern is compounding. More autonomy means more steps, more branches, more hidden assumptions, more access, and more time for one error to become a chain of errors. A misplaced file path, misunderstood instruction, poisoned webpage, stale credential, or false confidence score can matter more when the agent can keep acting.

Humanlike presentation adds another layer. If a system sounds helpful and confident while also controlling tools, users may trust it past the evidence. The paper treats misplaced trust as a harm multiplier: people may cede more control precisely when they should be asking for boundaries, uncertainty, and review.

The No-Go Zone

The fully autonomous level is the paper's no-go zone: systems capable of writing and executing their own code beyond predefined constraints. The problem is not that every generated script is dangerous. The problem is that code creation plus code execution can change the agent's own action space, bypass assumptions built into the original workflow, and produce failures that developers did not enumerate.

In high-stakes domains, that is an authority boundary. A system should not be allowed to expand its own permissions, invent new operational pathways, or act outside human-defined constraints simply because it can produce plausible code. The governance question is not "can the model solve the task?" It is "who remains able to say no before the system changes the world?"

Semi-autonomous systems can still be valuable. The paper's argument leaves room for bounded delegation, human approval, reversible actions, scoped tools, and domain-specific automation. It draws the line at ceding full control where safeguards cannot be foreseen or meaningfully enforced.

Governance Standard

Every agent deployment should declare its autonomy level. The declaration should name whether the model can select branches, call tools, loop, invoke other agents, access external services, modify persistent state, write code, execute code, schedule future actions, or operate after the user leaves.

The minimum receipt should include the model and harness version, system prompt or policy bundle, available tools, credentials, data sources, action limits, approval gates, logging rules, rollback route, incident owner, and no-go zones. For code-capable agents, it should separately record whether generated code can run, where it runs, with what permissions, under what sandbox, and after whose approval.

The minimum operating rule is staged autonomy. Read-only assistance can have one review standard. Tool calls that change records need stronger approval and rollback. Multi-step agents need trace-level observability. Code-writing agents need sandboxing and human acceptance. Code-writing and code-executing agents that can move beyond predefined constraints should not be treated as a product tier. They should be treated as a prohibited authority expansion unless an institution can prove a bounded, inspectable, revocable control regime.

This belongs beside AI Agents, Agent Tool Permission Protocol, Agent Audit and Incident Review, AI Audit Trails, Human Oversight in AI, and Humane Friction Standard. The practical lesson is simple: autonomy should be earned, logged, bounded, and reversible.

Limits

The paper is a normative and conceptual analysis, not a benchmark proving that every agent at a given level will fail. It does not supply a complete engineering standard for every domain. It also should not be flattened into "never build useful agents." Many low- and medium-autonomy systems may be appropriate when their action surfaces, users, and failure costs are understood.

Its value is the line it makes visible. The moment an AI system can plan across steps, choose tools, touch live environments, and especially create and execute new code, the user is no longer only consuming a model output. They are delegating authority. Authority needs a boundary before capability gets one.

Sources


Return to Blog