Blog · arXiv Analysis · Last reviewed July 2, 2026

The Communication Policy Becomes the Agent Interface

Proactive agents are usually judged by what they ask. This paper asks the more operational question: should the agent ask in free text, render a structured interface, or switch between the two? Once that choice is optimized, the channel becomes part of the agent's policy, not just a UX detail.

The Paper

The paper is Communication Policy Evolution for Proactive LLM Agents, arXiv:2606.14314 [cs.AI], by Xinbei Ma, Jiyang Qiu, Yao Yao, Zheng Wu, Yijie Lu, Xiangmou Qu, Jiaxin Yin, Xingyu Lou, Jun Wang, Weiwen Liu, Weinan Zhang, Zhuosheng Zhang, and Hai Zhao. arXiv lists version 1 as submitted on June 12, 2026, with DOI 10.48550/arXiv.2606.14314. The affiliations shown in the arXiv HTML are Shanghai Jiao Tong University and OPPO Research Institute.

The authors frame Communication Policy as a missing layer in proactive LLM agents. Prior work often studies how an agent decides what information is missing. This paper studies how the agent should communicate to recover that information.

The Information Gap

The setup starts from asymmetry. The user or planner holds a full task specification, including constraints, preferences, and edge cases. The agent receives a vague or partial specification and must close the gap through interaction before acting.

The paper studies this in two settings. In the User-Agent setting, the agent asks the user directly. In the Planner-Executor setting, an executor acts in an environment while communicating with a planner. In both cases, communication is not free: disclosed information can carry cost, sensitivity, impatience, ambiguity, or persona-specific constraints.

That matters because a clarification question is not neutral. A broad natural-language question can demand recall, expose private context, or leave ambiguity. A structured form can lower ambiguity and guide recognition, but it can also overconstrain the user, hide assumptions in available options, or make disclosure feel mandatory.

The Channel Policy

The paper gives agents two communication primitives. ask_question is the free-form text channel. generate_ui produces HTML-based forms for structured input. The evaluation compares text-only, UI-only, and hybrid policies in which the agent can choose a channel at each communication turn.

The main empirical result is complementarity. Text-based interaction often helps task performance because it is flexible and quick. Structured UI improves response quality and persona compliance because it organizes input and reduces ambiguity. Hybrid policies can combine those strengths, but simply giving an agent both channels is not enough. The policy has to decide when each channel is appropriate.

CPE

Communication Policy Evolution, or CPE, optimizes the communication-policy prompt rather than the model weights. The optimized text can include the system prompt, examples, or an environment-specific suffix. The agent and user-simulator models remain frozen.

Each CPE round evaluates the current policy on rollout episodes, asks an LLM to analyze the results, proposes a structured JSON patch, mutates the policy text, and then accepts or rejects the candidate. The paper uses a two-stage gate: a candidate first has to improve on the training batch and then improve against held-out validation data. The best held-out policy is maintained so validation performance is monotonically non-decreasing.

The reflection signal includes scores, trajectories, task specifications, the current policy, and patch history. Appendix J says the reflect model receives up to five trajectory excerpts, restricted to ask_question and generate_ui turns. Across benchmarks, the CPE configuration uses a 0.7 train fraction, max rounds of 30, 100 SWE-bench episodes per round, and 200 episodes per round for TravelGym, tau^2-bench, and WebArena. Candidate counts vary by benchmark, and the reflect model is the same as the agent model. The paper reports that the best policy was found within 25 rounds across all runs.

Experiments

The benchmark suite spans SWE-bench, TravelGym, tau^2-bench, and WebArena. The paper evaluates multiple model pairings, including Qwen3-32B, Qwen3-VL-32B, Seed-OSS-36B, GPT-5-mini, DeepSeek-V3.2, and GPT-4o in agent, user, or planner roles. Metrics include task success or productivity, response quality, proactivity, and persona compliance.

The persona set makes the channel problem concrete: amateur users should not be asked professional questions, do_selection users prefer constrained choices, one_question users tolerate only one query, and answer_more users can handle fuller forms. The same channel can be helpful or harmful depending on the persona and task state.

The CPE-discovered policies converge on interpretable heuristics. They tend to follow an environment-first pattern: explore the repository, page, or database first and clarify afterward. They use text for simple single factual queries and UI for structured input with several fields, visual layouts, or option comparisons. Some policies discover a 2-strikes escalation rule after repeated unknown or timeout responses. The optimized policies also adapt to persona constraints rather than treating every user as a generic information source.

Governance Standard

A proactive-agent evaluation should ship with a communication receipt. The receipt should include the full task specification, the vague specification shown to the agent, hidden or underspecified fields, sensitivity-cost definitions, channel options, communication-policy prompt, few-shot examples, generated HTML, text prompts, persona definitions, simulator prompt, model pair, environment, rollout logs, CPE policy version, JSON patches, accept and reject gates, held-out split, task success, response quality, proactivity, persona compliance, escalation rule, human override, and transcript or UI archive.

The governance issue is disclosure shaping. A system that chooses between ask_question and generate_ui is deciding not only how to gather information, but how much friction, ambiguity, and pressure the user experiences. The channel can determine what gets disclosed, what gets omitted, and which assumptions become embedded in the task state.

This connects directly to AI Agents, AI Browsers and Computer Use, AI Agent Observability, AI Evaluations, AI Audits and Assurance, The Agent Communication Graph Becomes Metadata, The GUI Uncertainty Becomes the Handoff Budget, and The Workspace Becomes the Digital Colleague. Agent communication is infrastructure. It needs logs, versioning, safety cases, and user-side contestability.

Limits

The largest limit is simulation. The paper's personas and user simulators make controlled comparison possible, but real users bring fatigue, anxiety, privacy caution, social pressure, accessibility needs, and inconsistent willingness to answer. A policy that performs well against a simulator may still pressure real users into over-disclosure.

Generated UI adds another boundary. HTML forms can reduce ambiguity, but they can also manipulate defaults, omit options, privilege one task interpretation, or collect more structured data than the user intended to provide. Any deployment of CPE-like optimization should audit not only task success but the form itself.

The Spiralist reading is that the interface is not downstream from the agent. It is where the agent's uncertainty becomes a request, a constraint, and sometimes a quiet demand.

Sources

Xinbei Ma, Jiyang Qiu, Yao Yao, Zheng Wu, Yijie Lu, Xiangmou Qu, Jiaxin Yin, Xingyu Lou, Jun Wang, Weiwen Liu, Weinan Zhang, Zhuosheng Zhang, and Hai Zhao, Communication Policy Evolution for Proactive LLM Agents, arXiv:2606.14314 [cs.AI], submitted June 12, 2026.
arXiv HTML: Communication Policy Evolution for Proactive LLM Agents, reviewed for affiliations, abstract, problem formulation, communication primitives, User-Agent and Planner-Executor settings, CPE algorithm, benchmark setup, personas, optimized-policy patterns, CPE configuration, and LLM-usage disclosure.
arXiv PDF: Communication Policy Evolution for Proactive LLM Agents.
Related pages: AI Agents, AI Browsers and Computer Use, AI Agent Observability, AI Evaluations, AI Audits and Assurance, The Agent Communication Graph Becomes Metadata, The GUI Uncertainty Becomes the Handoff Budget, and The Workspace Becomes the Digital Colleague.

Return to Blog