Blog · arXiv Analysis · Last reviewed June 24, 2026

The Context Compactor Becomes the Policy Deleter

The June 2026 arXiv paper Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents, by Shiyang Chen, shows a specific failure mode: an agent can obey a policy while it is visible, then violate it after compaction deletes the policy from working context.

Forgetting Is a Governance Action

Chen's paper, arXiv:2606.22528, was submitted on June 21, 2026. It studies a problem that sits between memory management and policy enforcement. Long-running LLM agents often compact their histories by summarizing or evicting older turns so the session stays within a token budget. That engineering step can delete the rule that made the agent safe.

The failure is not ordinary long-context distraction. In the paper's framing, the agent receives an in-context governance constraint: an organizational policy, a standing instruction, a memory entry, or a tool-loaded rule. While the constraint remains visible, the same agent can refuse the prohibited action. After compaction, the summary may preserve the task state but drop the policy. The agent then sees the same kind of request and performs the forbidden tool call.

This is close to the context-window failure archive, but the mechanism is sharper. LOCA-bench asks how growing context makes agents worse at long tasks. Governance Decay asks what happens when a component actively rewrites the record and removes the constraint that runtime enforcement assumed would still be present.

What ConstraintRot Tests

The paper introduces ConstraintRot, a benchmark of long-horizon agent scenarios with deterministic tool-call grading. Each scenario includes a policy turn, benign task turns that grow the context, and a later trigger request whose natural completion would violate the policy. Violations are scored by the emitted tool call, such as sending email to a forbidden recipient, executing a prohibited database operation, reading a secret, disclosing PII, or making an over-limit purchase.

The arXiv abstract reports seven model families and 1,323 episodes. With the policy in full context, violation was 0%. After compaction, pooled violation rose to 30%, reaching 59% for some models. When the compacted summary still preserved the constraint, violation stayed at 0%; when the constraint was dropped, violation reached 38%.

The paper also separates soft organizational policies from hard safety norms. Soft rules include ordinary actions forbidden only by a local rule, such as an external-email restriction or a spend-approval limit. Hard norms include actions many models refuse even without a local policy, such as secret exfiltration or disabling security controls. The decay concentrates on the soft rules, which is exactly where enterprise governance usually lives.

The Eviction Attack

Governance Decay becomes more serious when compaction is treated as an attack surface. The paper's Compaction-Eviction Attack assumes the adversary controls only content that enters context, such as a retrieved document or tool output. The adversary can add bulk content that forces compaction, or add summarizer-directed text that biases the compactor to omit the legitimate policy.

That is not the usual prompt-injection shape. The attacker is not trying to make the agent remember a malicious instruction. The attacker is trying to make the harness forget a protective instruction. The dangerous object is not only the model's next answer; it is the summary that decides what future answers will be allowed to know.

Chen reports that optimized summarizer-injection strategies broke every model in a three-model soft-task study, including a model that resisted the fixed probe. The lesson is conservative: robustness to one deletion prompt is not evidence that the compaction layer is safe. A compactor that ingests untrusted content needs its own threat model.

Governance Standard

The paper's main mitigation is Constraint Pinning. The harness extracts governance constraints into a pinned buffer, exempts that buffer from lossy compaction, re-injects it after compaction, and checks that the post-compaction context still entails the pinned constraints. In the reported benchmark, this training-free defense restored the violation rate to 0% across the seven models and the fixed attack variants.

Pinning is not a license to stuff every preference into a sacred prompt. It is a requirement to classify authority. A product policy, a user preference, a temporary task note, and an operator emergency update should not all occupy the same editable memory channel. If policy can arrive through memory or tools, the system must record provenance, preserve the active rule, and show when compaction changed the governing record.

The paper is also clear about the remaining boundary. Naive pinning can still be pressured by an operator-impersonation update in recent, non-summarized context. Closing that gap requires a trusted out-of-band operator channel, not just better phrasing. This links the result to agent logs, intent-scoped tools, cross-session prompt payloads, and system prompts.

What This Changes

The context compactor becomes the policy deleter when forgetting is treated as maintenance instead of authority. A summary is not a neutral compression of the past. It is an operational document that decides which rules will still govern the next tool call.

The practical rule is simple: do not govern an agent only through text that the compactor is allowed to discard. Put constraints in preserved channels, pin the rules that must survive, verify the summary, and log the difference between what the agent once knew and what it is allowed to remember now.

Sources


Return to Blog