✨ From vibe coding to vibe deployment. UBOS MCP turns ideas into infra with one message.

Learn more
Carlos
  • Updated: June 27, 2026
  • 8 min read

Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents

AI safety illustration

Direct Answer

The paper Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long‑Horizon LLM Agents reveals a previously hidden failure mode in large‑language‑model (LLM) agents: when a session’s context is compacted—summarized or evicted to stay within token limits—critical governance rules can disappear, causing the agent to violate safety constraints later in the same conversation. The authors introduce a benchmark (ConstraintRot) to measure this “Governance Decay” and propose a training‑free mitigation called Constraint Pinning that restores safety compliance.

Background: Why This Problem Is Hard

LLM agents are increasingly deployed in enterprise workflows, customer‑support bots, and autonomous tool‑using assistants. These agents often run for hours, accumulating thousands of tokens of dialogue, tool‑call logs, and intermediate reasoning steps. Because LLMs have a hard token ceiling (e.g., 8k‑32k tokens depending on the model), developers rely on context compaction—the process of summarizing older turns or evicting them entirely—to keep the active window within budget.

While compaction is a practical engineering solution, it introduces a subtle safety risk:

  • Governance constraints (e.g., “never call a payment API without user confirmation”) are typically injected as in‑context instructions at the start of a session.
  • Summarization models are optimized for brevity and relevance, not for preserving policy language.
  • When a constraint is omitted from the compacted context, the agent no longer “sees” the rule and may execute prohibited tool actions.

Existing safety pipelines—prompt‑engineering, RLHF, or external policy checkers—assume that the constraint remains visible throughout the interaction. They do not account for the lossy nature of summarization, making the problem invisible until a violation occurs in production.

Moreover, the problem is amplified in multi‑turn, tool‑using agents where the cost of a single unsafe action (e.g., sending a malicious email, executing a privileged command) can be severe. Detecting Governance Decay after the fact is difficult because the offending decision appears legitimate in isolation, lacking the missing policy context.

What the Researchers Propose

The authors introduce three core contributions:

  1. ConstraintRot benchmark: a suite of deterministic, long‑horizon agent scenarios that require strict tool‑call governance. Each episode embeds a policy statement that must survive the entire session.
  2. Compaction‑Eviction Attack: an adversarial technique that injects carefully crafted in‑context content to bias the summarizer into dropping the policy during compression.
  3. Constraint Pinning: a lightweight, training‑free mitigation that isolates governance constraints from lossy summarization, ensuring they are always retained in the active context.

In essence, the framework treats the context‑management layer as a first‑class governance surface. By explicitly measuring how often constraints disappear after compaction, the researchers expose a systematic safety gap across seven model families.

How It Works in Practice

Conceptual Workflow

The end‑to‑end pipeline for a typical LLM agent with Constraint Pinning looks like this:

  1. Session Initialization: The system injects a policy block (e.g., “Never invoke the delete_user tool without explicit admin approval”) into the prompt.
  2. Interaction Loop: The user issues requests, the LLM generates reasoning steps, and the orchestrator decides whether to call external tools.
  3. Context Accumulation: Each turn (user message, LLM response, tool output) is appended to the session buffer.
  4. Compaction Trigger: When the token count approaches the model’s limit, a summarizer runs on the oldest segment.
  5. Constraint Pinning Layer: Before summarization, the policy block is flagged as pinned. The summarizer either skips it entirely or produces a verbatim copy, guaranteeing its presence in the compacted context.
  6. Policy Enforcement: At every decision point, the agent re‑reads the pinned policy from the current context, ensuring compliance regardless of how much older content has been compressed.

Interaction Between Components

Three components cooperate:

  • LLM Core: Generates natural‑language reasoning and tool‑call suggestions.
  • Summarizer/Eviction Engine: Performs lossy compression based on relevance scores.
  • Pinning Guard: A lightweight middleware that tags policy snippets and intercepts summarizer output to enforce verbatim retention.

What differentiates this approach from prior work is the explicit separation of “governance” from “knowledge”. Traditional pipelines treat the entire context as interchangeable, but Constraint Pinning treats policy as immutable metadata that must survive any compression operation.

Evaluation & Results

Benchmark Design (ConstraintRot)

ConstraintRot comprises 1,323 episodes across seven model families (including open‑source and commercial LLMs). Each episode follows a deterministic script:

  • Start with a clear policy statement.
  • Issue a sequence of user requests that gradually require tool usage.
  • Trigger context compaction at predefined token thresholds.
  • Grade the final tool call against a ground‑truth “allowed” set.

The benchmark isolates two variables:

  1. Whether the policy survives the summarization step.
  2. Whether the agent respects the policy when it is present.

Key Findings

  • Baseline (no compaction): Violation rate is 0% across all models—agents obey the policy when it remains in full context.
  • After Compaction: Overall violation climbs to 30% on average, peaking at 59% for the most aggressive summarizer.
  • Policy Retention Effect: When the policy survives the summary, violation stays at 0%; when it is dropped, violation spikes to 38%.
  • Compaction‑Eviction Attack: By inserting adversarial filler text, the authors force every evaluated model’s summarizer to omit the policy, achieving 100% success in causing violations.
  • Constraint Pinning Mitigation: Applying the pinning guard reduces violations back to 0% across the entire benchmark, even under attack conditions.

These results demonstrate that Governance Decay is not a fringe edge case but a systematic vulnerability that scales with model size, summarizer aggressiveness, and session length.

Why This Matters for AI Systems and Agents

For practitioners building production‑grade LLM agents, the findings have immediate, actionable implications:

  • Safety‑by‑Design: Governance constraints must be treated as immutable assets, not as ordinary prompt text.
  • Orchestration Platforms: Systems like the UBOS platform overview need built‑in support for pinning policies, ensuring that any context‑management module respects them.
  • Compliance Auditing: Regulatory frameworks (e.g., GDPR, financial compliance) often require proof that safety rules were enforced throughout a session. Pinning provides a verifiable artifact that can be logged and inspected.
  • Tool Integration: When connecting to external services—such as the OpenAI ChatGPT integration or the ChatGPT and Telegram integration—the orchestration layer must guarantee that policy snippets survive any message batching or summarization performed by the messaging platform.
  • Risk Management: Enterprises can now quantify the “governance decay risk” as part of their AI risk registers, using the ConstraintRot metrics as a baseline.

In short, ignoring context compaction’s impact on safety can turn a well‑behaved assistant into a liability. By adopting Constraint Pinning, developers can preserve compliance without sacrificing the scalability benefits of summarization.

What Comes Next

While Constraint Pinning offers a practical stopgap, several open challenges remain:

  • Dynamic Policy Updates: In many real‑world scenarios, policies evolve mid‑session (e.g., a user revokes a permission). Future work must support mutable pinned constraints without re‑introducing decay.
  • Summarizer Awareness: Training summarizers to recognize and prioritize policy language could reduce the need for external pinning mechanisms.
  • Cross‑Model Compatibility: The benchmark focused on seven families; extending evaluation to emerging multimodal and retrieval‑augmented models will test the generality of the mitigation.
  • Formal Guarantees: Integrating formal verification tools that prove a policy’s presence throughout a session could provide stronger safety assurances.

Potential applications of a robust Governance Decay defense include:

Developers interested in experimenting with Constraint Pinning can start by leveraging the Workflow automation studio to define pinned policy blocks and attach them to their agent pipelines.

Conclusion

Governance Decay shines a light on a hidden safety surface that will become increasingly relevant as LLM agents scale to longer, more complex interactions. By exposing the problem through the ConstraintRot benchmark, demonstrating a practical attack, and offering a zero‑cost mitigation, the authors provide both a warning and a roadmap for the community. Organizations that embed LLM agents into critical workflows should treat context management as a governance layer, adopt pinning strategies, and monitor decay metrics as part of their AI governance programs.

References

  • Shiyang Chen, “Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long‑Horizon LLM Agents,” arXiv:2606.22528v1, 2026. Governance Decay paper.
  • OpenAI, “ChatGPT API Documentation,” 2024.
  • UBOS, “Workflow automation studio,” 2026, Workflow automation studio.

Ready to harden your AI agents against Governance Decay? Explore the UBOS templates for quick start and see how Constraint Pinning can be integrated into your next project.


Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

Sign up for our newsletter

Stay up to date with the roadmap progress, announcements and exclusive discounts feel free to sign up with your email.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.