The Minimal View Becomes the Privacy Broker
MINIM moves the privacy decision out of the remote agent and into a trusted local broker. The agent does not get the full interface state by default. It gets a task-conditioned minimal view: keep what is necessary, abstract what is sensitive but needed, and remove the rest.
The Paper
The paper is Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization, arXiv:2606.13949 [cs.AI], by Hexuan Yu, Chaoyu Zhang, Heng Jin, Shanghao Shi, Ning Zhang, Y. Thomas Hou, and Wenjing Lou. arXiv lists version 1 as submitted on June 11, 2026, with DOI 10.48550/arXiv.2606.13949. The arXiv record says the paper was accepted at ICML 2026, the 43rd International Conference on Machine Learning in Seoul, South Korea.
The official code link from arXiv points to yyyyhx/MINIM. The repository describes MINIM as a reference implementation for the ICML 2026 paper, with training, inference, and evaluation scripts. The companion dataset is Chaoyu112358/MINIM-data on Hugging Face, listed with an MIT license.
Semantic Over-Privileged Observation
Modern agents often act through structured observations: accessibility trees, DOM-like hierarchies, tool schemas, scene graphs, and similar state descriptions. Those structures are useful because they expose roles, labels, buttons, inputs, text fields, and layout relationships in a form an agent can use. They are risky because the same observation can include authentication codes, private notifications, saved addresses, unrelated browser tabs, Slack messages, background windows, and other context the current task does not need.
The paper names this failure mode Semantic Over-Privileged Observation. The leak is not only that sensitive text exists. The leak is that sensitive, task-irrelevant text is disclosed together with semantic structure that makes it easy to interpret, profile, and reuse.
That changes the privacy boundary for agents. Prompt redaction and downstream policy checks are too late if the remote inference server has already received the full accessibility tree. The control has to sit before disclosure, at the client-side observation channel.
MINIM
MINIM is a trusted local broker. It intercepts the raw accessibility tree before the observation leaves the device, scores each node, and emits a sanitized tree for the remote agent. The paper grounds the policy in Contextual Integrity: information flow is appropriate only when it fits the task context, actors, attributes, and transmission principle.
The scoring model predicts two values for each node: an inherent sensitivity score and a task-conditioned necessity score. Sensitivity asks how risky the node would be if disclosed. Necessity asks whether the current task actually needs that node. The policy then maps scores into three actions: Remove, Abstract, or Keep.
Remove drops unnecessary nodes. Abstract preserves a sensitive but necessary element's structural role while replacing sensitive attributes with placeholders. Keep passes through low-risk necessary elements. The default policy uses tau_nec=1.0 and tau_sens=5.0. This separation between scoring and enforcement matters because administrators can tune thresholds or rendering policy without retraining the scorer.
Dataset and Scorer
The paper builds a privacy-augmented corpus from WebArena accessibility trees across Shopping, Reddit, and Gmail. It reports 150 unique trees, 27 task types, and 5,403 tree-task variants, split into 4,741 training variants and 662 test variants. The setup injects synthesized sensitive context such as 2FA codes, password prompts, system notifications, emails, Slack messages, and other disclosure-risk elements.
Each UI element receives a sensitivity score and a task-conditioned necessity score on a 0 to 10 rubric. The released Hugging Face dataset page exposes fields such as domain, sample ID, goal ID, and paths to tree_with_scores.json. The repository README describes the released dataset as 5,750 annotated accessibility-tree variants across Gmail, Reddit, and shopping domains, which is broader than the paper's reported experimental split.
The local scorer is a deployable GATv2 model over the accessibility tree. Each UI element becomes a graph node with a 512-dimensional feature vector built from a 384-dimensional MiniLM text embedding plus UI attributes and structural features. The backbone has 3 GATv2 layers with hidden size 256 and 4 heads, followed by two MLP heads for sensitivity and necessity prediction. The training setup uses AdamW, learning rate 3e-4, L2 regularization 1e-5, batch size 32, 10 epochs, and loss weights alpha 1.0 and lambda 1.0.
Results
On the held-out WebArena-derived test set of 662 variants, MINIM reports TCNP 0.9491, TCNP-I 0.9931, and normalized TISL 0.1010. TCNP measures task-critical context preservation. TCNP-I measures preservation of task-critical interactive affordances. TISL measures task-irrelevant sensitive leakage, where lower is better.
The baseline comparison shows why binary or single-score policies are not enough. Full Observation has perfect utility and full leakage. Random Budget has TCNP 0.2284, TCNP-I 0.2346, and TISL 0.1971. Sensitivity-Only preserves little utility, with TCNP 0.0401 and TCNP-I 0.0393. Necessity-Only keeps utility, with TCNP 0.9445 and TCNP-I 0.9730, but leaks more, with TISL 0.2032. MINIM roughly halves Necessity-Only leakage while improving actionable recall.
Prompted open-weight LLM scorer baselines preserve high utility but leak more. Qwen3-8B-Instruct, Nemotron-Nano-9B, GPT-OSS-20B, Llama-3.3-70B-Instruct, Mistral-7B-v0.3, Llama-3-8B-Instruct, and Gemma-3N-E4B report TISL from 0.194 to 0.312, while MINIM reports 0.101. The paper also notes that MINIM keeps only 12.0 percent of nodes, compared with roughly 25.8 to 34.5 percent for those prompted LLM scorers.
The contextual integrity checks are the most relevant governance result. Against full observation, MINIM suppresses 89.9 percent of task-irrelevant sensitive leakage while maintaining about 12 percent retention. Diagnostic checks report that MINIM suppresses > 99.9 percent of injected 2FA codes, passwords, and Slack notifications. Domain-level results show Gmail at 96.61 percent TCNP and 1.88 percent TISL, Shopping at 93.60 percent TCNP and 0.85 percent TISL, and Reddit at 95.38 percent TCNP but 13.50 percent TISL, reflecting the ambiguity of user-generated content.
The abstraction ablation makes the design point. Mapping all Abstract decisions to Remove lowers TISL slightly from 0.1010 to 0.0989, but reduces TCNP-I from 0.9931 to 0.9650 and TCNP from 0.9491 to 0.9050. Mapping Abstract to Keep preserves utility but raises leakage slightly to 0.1014. The middle action is not cosmetic; it is what lets a privacy broker preserve a form field or button without revealing the raw secret attached to it.
Governance Standard
An agent that consumes UI state should ship an observation-minimization receipt. The receipt should include the raw observation channel, task string, user intent source, structured representation type, local broker version, scorer architecture, scorer checkpoint, sensitivity rubric, necessity rubric, thresholds, per-node scores, per-node action, redaction or abstraction renderer, retained-node percentage, removed-node percentage, abstracted-node percentage, task-critical node preservation, task-critical interactive preservation, task-irrelevant sensitive leakage, high-risk element tests, domain-level residual leakage, dataset split, code version, dataset version, and human override path.
The receipt should also name the threat model. MINIM assumes a trusted local environment and an honest-but-curious remote inference server. That is a useful deployment model, but it is not protection against local malware, prompt injection, malicious content, long-horizon cumulative disclosure, or a compromised broker. Privacy claims should say which of those risks are in scope.
This connects directly to AI Agents, AI Agent Sandboxing, AI Agent Observability, Tool Use and Function Calling, Data Minimization, Contextual Integrity, Differential Privacy, Privacy and Data, The Tool Call Becomes the Privacy Boundary, The Personal Desktop Becomes the Agent Exam, The Crop View Becomes the GUI Grounding Receipt, The Agent Data Acquisition Becomes the Boundary, The Group Chat Assistant Becomes the Privacy Boundary, The Prompt Cache Becomes the Shadow Memory, and The Agent Operational Envelope Becomes the Trust Certificate.
Limits
The result depends on the scorer. A high-sensitivity node misclassified as unnecessary will be removed, which may break a task. A high-sensitivity node misclassified as necessary or low-risk can leak. The policy is only as reliable as the training data, rubric, feature representation, and domain coverage behind those scores.
The evaluation is controlled. The paper's WebArena instantiation uses 27 curated task templates across Gmail, Reddit, and Shopping. That is useful for repeatable measurement, but it is not open-ended desktop coverage. Specialized enterprise interfaces, cross-app workflows, and full desktop state may require different encoders, caching, latency budgets, and human review rules.
The threat model is intentionally narrow. MINIM targets step-wise minimization against honest-but-curious remote inference. It does not address active adversaries, prompt injection, local compromise, or cumulative privacy loss across long-horizon episodes. The Hugging Face dataset page was live when reviewed, but its preview currently reports a dataset generation error, so reproducibility should rely on the files and repository instructions rather than the hosted preview alone.
Sources
- Hexuan Yu, Chaoyu Zhang, Heng Jin, Shanghao Shi, Ning Zhang, Y. Thomas Hou, and Wenjing Lou, Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization, arXiv:2606.13949 [cs.AI], submitted June 11, 2026.
- arXiv HTML: Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization, reviewed for abstract, problem setup, Contextual Integrity framing, threat model, methodology, experiments, limitations, and appendices.
- arXiv PDF: Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization, reviewed for tables, exact metrics, model details, training hyperparameters, dataset statistics, ablations, and limitations.
- Official code: yyyyhx/MINIM, reviewed for repository status, README, result snapshot, files, quick-start commands, policy thresholds, and dataset link.
- Dataset: Chaoyu112358/MINIM-data, reviewed for dataset card, license, tags, fields, sample paths, and hosted-preview caveat.
- Related pages: AI Agents, AI Agent Sandboxing, AI Agent Observability, Tool Use and Function Calling, Data Minimization, Contextual Integrity, Differential Privacy, Privacy and Data, The Tool Call Becomes the Privacy Boundary, The Personal Desktop Becomes the Agent Exam, The Crop View Becomes the GUI Grounding Receipt, The Agent Data Acquisition Becomes the Boundary, The Group Chat Assistant Becomes the Privacy Boundary, The Prompt Cache Becomes the Shadow Memory, and The Agent Operational Envelope Becomes the Trust Certificate.