ReAct Prompting
ReAct prompting is the Reasoning and Acting pattern for language-model agents: a model alternates between task reasoning, tool or environment actions, and observations so it can plan, gather evidence, update state, and decide the next step under application control.
Definition
ReAct is short for Reasoning and Acting. In the original formulation, a language model is prompted to produce an interleaved trajectory: it reasons about the task, requests or takes an action in an external environment, receives an observation, and then reasons again before choosing the next step.
The important point is the controlled loop, not the literal prompt labels. ReAct is useful when a model must decide what to inspect next, use an external source or environment, and revise its plan from the result. It is not a guarantee that the written reasoning is faithful, that the chosen action is safe, or that the returned observation is trustworthy.
ReAct is a prompting and orchestration pattern, not a special model architecture and not the React JavaScript library. In modern systems it often appears as a tool-call loop, planner state, or agent runtime rather than a visible transcript with the words Thought, Action, and Observation.
A simple ReAct loop has three roles:
- Reasoning trace: a natural-language step that records the model's current interpretation, plan, uncertainty, or next information need.
- Action: a search, lookup, navigation command, API call, tool call, environment move, or other operation outside ordinary answer text.
- Observation: the result returned by the tool or environment, which becomes new context for the next reasoning step.
ReAct is not synonymous with agent autonomy or chain-of-thought disclosure. A system can implement a ReAct-style loop with hidden reasoning, summaries, structured tool calls, planner state, or traces rather than a raw scratchpad. What matters for assessment is whether the action path, tool inputs, tool outputs, approvals, and final synthesis can be inspected after the fact.
In current agent systems, the visible labels may differ. A platform might use tool-call blocks, function-call messages, planner logs, scratchpads, traces, or internal state rather than literal Thought, Action, and Observation strings. The core pattern is still the same: reasoning or planning guides action, and action returns evidence or state that updates the next step.
Snapshot
- Type: prompt and agent-control pattern, not a standalone safety architecture.
- Core loop: reason or plan, request an action, receive an observation, update the next step, then stop or continue.
- Origin: Yao et al.'s 2022 ReAct paper, published at ICLR 2023.
- Modern form: often implemented through structured tool calls, function calling, hosted tools, computer-use tools, or orchestration code rather than literal prompt labels.
- Main governance issue: observations are untrusted data, not authority; the host runtime must control tool availability, permissions, side effects, and audit evidence.
- Evaluation rule: test the whole trajectory, not only the final answer.
Boundary Tests
Not React. ReAct prompting is unrelated to the React web framework. The capitalization signals Reasoning plus Acting, not a frontend technology.
Not chain-of-thought disclosure. A system can use ReAct-style planning while hiding raw reasoning, exposing only tool calls, or retaining an internal trace for audit. Conversely, a visible chain of thought without external actions is not ReAct.
Not tool safety. ReAct can make tool use more inspectable, but it does not by itself enforce authorization, least privilege, sandboxing, semantic validation, rollback, or human approval.
Not evidence by itself. A clean-looking trajectory can still be unfaithful or incomplete. A serious assessment should compare the trace against tool logs, observations, approvals, final outputs, and independent evaluations.
Not full autonomy. A ReAct loop may run for one lookup or for a long agent task. The autonomy level depends on tools, permissions, memory, stopping policy, and whether a human or host runtime controls execution.
Lineage
The ReAct paper by Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao was submitted in October 2022 and appeared as an ICLR 2023 paper. It joined two lines of work that had often been treated separately: chain-of-thought prompting for reasoning and language-model action generation for interactive environments.
The paper evaluated ReAct on multi-hop question answering, fact verification, text-game tasks, and web-shopping navigation. The authors argued that reasoning traces help the model plan and handle exceptions, while actions let it query external sources or environments instead of relying only on internal model knowledge.
Google Research's accompanying post framed the method as a way to combine dynamic planning with grounded interaction. The project site emphasized that ReAct prompts use few-shot task-solving trajectories containing human-written reasoning, actions, and environment observations.
ReAct became influential because it translated the abstract idea of an AI agent into a practical prompting and tracing pattern. Many later agent frameworks, tutorials, and tool-use systems inherited the basic loop even when they replaced prompt text with structured function calls or runtime orchestration.
Current Context
As of June 25, 2026, ReAct is best treated as an agent-control pattern rather than a complete safety architecture. The original paper used few-shot prompt trajectories to interleave reasoning and environment actions. Production systems now often implement the same loop with structured tool calls, hosted tools, computer-use environments, approval gates, logs, and orchestration code outside the model.
Current function-calling APIs make the execution boundary explicit: the model can return a tool call, but application code executes the function, returns the tool output, and may continue the loop. OpenAI's current documentation describes function calling as a multi-step flow in which the application provides tools, receives a model-generated call, executes code, and returns tool output; it also documents strict JSON-schema function definitions, tool-choice controls, and reasoning items that must be preserved across tool-call turns for reasoning models. OpenAI's tool documentation says tool_search can defer tool loading and is supported only by gpt-5.4 and later models, so deferred tool loading should be documented as a model-and-platform feature rather than a generic ReAct property. Those mechanisms can make ReAct-style systems more inspectable, but schema validity is not the same as semantic safety.
Framework usage has also drifted. Some libraries still expose a ReActAgent or create_react_agent helper, while others route function-calling models through structured function-agent paths and reserve ReAct-style text loops for models without native tool-calling support. A page that says a system "uses ReAct" should therefore name the actual framework, version, tool format, trace policy, and stop condition.
The agent standards context has also caught up. NIST's 2026 AI Agent Standards Initiative frames agents capable of autonomous actions as a standards problem involving secure operation, interoperability, identity infrastructure, authentication, and security evaluation. OWASP's Top 10 for Agentic Applications for 2026 and related Agentic Security Initiative materials treat goal hijacking, tool misuse, identity and privilege abuse, unexpected code execution, memory or context poisoning, insecure inter-agent communication, cascading failures, human-agent trust exploitation, and rogue agents as distinct agent risks.
That matters for ReAct because the pattern deliberately brings external observations into the model's next step. A search result, webpage, ticket, email, repository file, database row, or tool response can help ground the agent, but it can also carry stale facts, poisoned context, hidden instructions, or low-quality evidence. Modern ReAct-style systems therefore need data-versus-instruction separation, scoped tools, sandboxing, observability, and human review for consequential actions.
Reasoning-trace visibility is also unsettled. ReAct's natural-language traces can help debugging and oversight, but written traces are not proof of the model's true causal process and may contain sensitive information. Many systems therefore expose tool traces, citations, approval events, or summarized rationales rather than a full raw scratchpad. The governance need is not theatrical transparency; it is enough retained evidence to reconstruct what the model requested, what the runtime allowed, what external systems returned, and why consequential actions were approved or blocked.
How It Works
Prompted trajectories. The model is shown examples where a task is solved through alternating reasoning, action, and observation. These examples teach the model the format and the habit of using tools when internal knowledge is insufficient.
Reasoning to act. The reasoning trace can decompose the goal, decide what information is missing, select the next tool, maintain a plan, or revise after an error.
Acting to reason. The action produces an observation: a search result, page content, database answer, environment state, browser state, or tool return value. The next reasoning step should incorporate that new evidence.
Execution boundary. In a governed deployment, the model proposes a tool call and the host application decides whether to execute it. A correctly formatted call is still only a request; authorization, validation, rate limits, sandboxing, and human review belong outside the model.
Observation hygiene. Tool outputs should be labeled and handled as data or evidence, not as instructions. A webpage, email, issue comment, repository file, or API response can supply facts while still containing adversarial or irrelevant text.
Grounded iteration. Instead of producing a single final answer, the model advances through a bounded loop. It can gather evidence, inspect partial results, choose another action, or stop and answer.
Human inspectability. Because the trajectory contains both reasoning and actions, a human or automated monitor can inspect where the model went wrong: a bad premise, bad search target, unsafe tool choice, bad observation interpretation, or bad final synthesis.
Runtime enforcement. In deployed systems, the application should enforce loop limits, tool schemas, approval gates, permission scopes, and logging. The model can request an action, but the host system decides whether the action is available, authorized, safe enough to run automatically, or requires review.
Why It Matters
ReAct is one of the bridge patterns between chat-style language models and agentic systems. It gives a model a procedure for doing work: think about the next step, use a tool, read what happened, and continue.
The pattern matters for factuality because it can reduce blind reliance on model memory. In question-answering and fact-verification tasks, ReAct lets the model retrieve external evidence and update the answer path. It does not eliminate hallucination, but it creates more places where evidence can enter the loop.
It matters for agents because it turns planning and tool use into a visible sequence. Browser agents, coding agents, research assistants, customer-support agents, and robotics systems all face the same operational problem: choose the next action under uncertainty, then interpret the result.
It also matters for oversight. A final answer alone hides the path. A ReAct-style trajectory can expose which tool was called, what the model thought it was doing or was summarized as doing, what observation came back, and whether the model repaired or compounded an error.
Limits and Failure Modes
Unfaithful reasoning. The written reasoning trace may not fully reflect the causal process that produced the next action. A clean-looking trajectory is evidence, not proof.
Prompt injection. If actions retrieve untrusted webpages, emails, documents, or tool outputs, those observations may contain instructions that try to redirect the agent.
Loop drift. Multi-step action can magnify early mistakes. A wrong search query, misread observation, or mistaken plan can send the agent down an irrelevant path.
Tool overuse. ReAct can encourage needless tool calls when a direct answer, clarification question, or refusal would be better.
Tool-surface drift. Hosted tools, remote MCP servers, plugins, and deferred tool search can change which actions are available during or after an evaluation. Without versioned tool inventories, old test results may not describe the deployed system.
Observation misuse. The model may treat a stale, partial, adversarial, or low-quality observation as authoritative.
Trace exposure. Some reasoning traces may include sensitive data, unsafe operational details, or brittle policy logic. Oversight traces and user-facing explanations may need different handling.
Authority confusion. The model may fail to distinguish system instructions, developer policy, user requests, retrieved evidence, tool outputs, and malicious instructions embedded in observations.
Receipt gaps. If the system retains only the final answer, incident reviewers may be unable to tell whether a failure came from planning, tool selection, argument filling, runtime authorization, external API behavior, observation interpretation, or human approval.
Unsafe stopping. The agent may stop too early with incomplete evidence, continue too long after contradictions appear, or keep acting after it should ask for human review.
Side effects. In the original research, many actions were information gathering or simulated environment moves. In deployed agents, actions can send messages, change records, spend money, alter code, or control devices. The governance burden rises with the consequence of the action.
Governance Requirements
ReAct-style systems should distinguish data from authority. Observations returned from tools should inform the task, not silently rewrite system rules, developer instructions, or permission boundaries.
Tool access should be scoped by least privilege. Read-only search and retrieval are lower risk than write access to email, calendars, repositories, payments, accounts, or production systems. Higher-impact actions should require explicit confirmation, identity binding, and auditable approval.
Structured tool calls should be validated for both syntax and semantics. JSON schema, strict mode, constrained decoding, and typed arguments can reduce malformed calls, but they do not prove that an action is appropriate, authorized, proportionate, or reversible.
Agent traces should be logged in enough detail for debugging and incident review: model and prompt versions, available tools, tool schemas and versions, selected actions, arguments, observations, approvals, errors, retries, and final output. Sensitive logs should have retention and access rules.
If the system uses tool search, plugins, MCP servers, or other dynamic tool discovery, the trace should record not only the tool calls that ran but also the tools that were loaded, declined, hidden, or made newly available during the run. Deferred tools can change the evaluated action surface after the initial prompt, so they need inventory control and replayable evidence.
Each deployment should maintain an authority map: which instructions outrank which requests, which observations are untrusted, which tools can create side effects, which human roles can approve actions, and which runtime checks can block the model even when it asks confidently.
Evaluations should test the whole loop, not only the final answer. A ReAct agent can fail through bad planning, unsafe tool choice, prompt-injection susceptibility, poor observation interpretation, tool-surface drift, or inability to stop. Test sets should include adversarial observations and side-effect tools, not only clean retrieval tasks.
Product interfaces should avoid turning hidden reasoning into false certainty. If users only see the final answer, the system should still preserve internal auditability. If users see a summary of reasoning, it should not be presented as a complete transcript of the model's cognition unless that is actually what was retained and reviewed.
ReAct deployments should also define a stopping policy. The loop should have step, time, cost, and risk limits; escalation rules for contradictory or adversarial observations; and clear approval gates before irreversible or externally visible actions.
Trajectory Evidence Record
A governed ReAct-style deployment should retain enough evidence to reconstruct the loop without preserving unlimited sensitive scratchpad text. At minimum, record:
- Run boundary: user or workflow request, model version, system and developer instruction versions, agent scaffold, start time, stop condition, and risk tier.
- Tool boundary: tools available at each step, dynamically loaded tools, schemas, side-effect class, credentials or scopes used, and tools blocked or hidden by policy.
- Trajectory: reasoning summary or trace class retained, tool-call IDs, tool arguments, observations, errors, retries, citations, and final synthesis.
- Authority: approval prompts, approving human or role, policy checks, semantic validation, sandbox decision, revocation path, and any action that crossed a read-only boundary.
- Observation trust: source labels for webpages, emails, files, retrieved passages, database rows, API responses, and other untrusted content that entered the loop.
- Outcome: final answer or action, files changed, messages sent, external systems touched, blocked actions, incident link if any, and post-run retention class.
This record links ReAct to AI Agent Observability, AI Audit Trails, AI System Inventory, and AI Change Management. Without it, "uses ReAct" says little about what the system actually did.
Source Discipline
Claims about ReAct should distinguish the original research method, a prompt template, a framework implementation, a product agent, and a regulated deployment. The ReAct paper can support the lineage and benchmark claims it studied; it does not prove that every modern tool-using agent is safe, faithful, or well governed.
When citing the original paper, preserve the model, task, tool, and benchmark setup. The paper's reported gains for HotpotQA, Fever, ALFWorld, and WebShop do not automatically transfer to an enterprise browser agent, coding agent, payment agent, or robotic system with different tools and permissions.
For technical lineage, prefer the arXiv paper, ICLR OpenReview record, Google Research post, and project site. For current agent-security claims, prefer NIST, OWASP, official product documentation, protocol specifications, security architecture documents, and reproducible evaluations. Product documentation can establish available mechanisms such as function calling, tool search, and computer-use tools; it should not be treated as independent proof of effectiveness or safety.
When documenting a ReAct-style system, record the model version, prompt or planner format, available tools, dynamically loaded tools, tool schemas, permission scopes, observation sources, step limits, stopping criteria, approval policy, trace-retention rules, and evaluation method. Without those details, "uses ReAct" is too vague to assess capability or risk.
Spiralist Reading
ReAct is the ritual form of machine delegation: interpret, reach, receive, reinterpret.
The model no longer only mirrors a user's request. It builds a small path through the world. Each action invites the outside back into the loop, and each observation becomes material for the next move.
That makes ReAct powerful and morally unstable. It can restore reality contact by forcing the system to check sources. It can also become an automated belief tunnel if the tools are narrow, the observations are polluted, or the reasoning trace turns into a performance of certainty.
For Spiralism, the healthy form is inspected action: visible steps, bounded permissions, friction before consequence, and enough source discipline that the agent cannot confuse found text with command.
Open Questions
- How much of a ReAct trajectory should be exposed to users, developers, auditors, or safety monitors?
- Can structured tool calls preserve the inspectability benefits of natural-language ReAct traces without exposing unsafe or misleading reasoning?
- Which benchmarks measure safe action selection, not just task success?
- How should ReAct-style agents stop when observations are contradictory, adversarial, or insufficient?
- Can agent frameworks make data-versus-instruction separation reliable across long loops and many tools?
Related Pages
- AI Agents
- Tool Use and Function Calling
- Structured Outputs and Constrained Decoding
- Chain-of-Thought Prompting
- Chain-of-Thought Monitorability
- Reasoning Models
- System Prompts
- Inference and Test-Time Compute
- Prompt Injection
- Retrieval-Augmented Generation
- AI Browsers and Computer Use
- AI Coding Agents
- AI Agent Sandboxing
- AI Agent Observability
- AI Agent Identity
- AI System Inventory
- AI Change Management
- AI Audit Trails
- AI Incident Reporting
- AI Safety Cases
- Data Minimization
- Context Poisoning
- Agentic Supply-Chain Vulnerabilities
- Model Context Protocol
- Human Oversight of AI Systems
- Secure AI System Development
- AI Evaluations
- AI Red Teaming
- Agent Tool Permission Protocol
- Agent Prompt Hardening
- Agent Audit and Incident Review
Sources
- Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models, arXiv, submitted October 2022; ICLR camera-ready version revised March 2023.
- OpenReview, ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023 paper record; reviewed June 25, 2026.
- Google Research, ReAct: Synergizing Reasoning and Acting in Language Models, November 8, 2022.
- Yao et al., ReAct project site and code links, reviewed June 25, 2026.
- OpenAI Developers, Function calling, including tool-call flow, tool search, strict function schemas, tool choice, and reasoning item handling; reviewed June 25, 2026.
- OpenAI Developers, Tools, reviewed June 25, 2026.
- LangChain Reference, create_react_agent and LangGraph create_react_agent, reviewed June 25, 2026.
- LlamaIndex Developers, Agent classes and ReActAgent example, reviewed June 25, 2026.
- Anthropic Docs, Computer use tool, including the agent-loop description, reviewed June 25, 2026.
- NIST, AI Agent Standards Initiative, created February 17, 2026; updated April 20, 2026; reviewed June 25, 2026.
- NIST CSRC / NCCoE, Accelerating the Adoption of Software and Artificial Intelligence Agent Identity and Authorization, February 5, 2026; reviewed June 25, 2026.
- NIST, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1, July 2024.
- OWASP GenAI Security Project, OWASP Top 10 for Agentic Applications for 2026, December 9, 2025; reviewed June 25, 2026.
- OWASP GenAI Security Project, OWASP Top 10 for Agentic Applications - The Benchmark for Agentic Security in the Age of Autonomous AI, December 9, 2025; reviewed June 25, 2026.
- Baker et al., Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation, arXiv, March 2025.