Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents
arXiv cs.AI / 4/17/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes “layered mutability” as a framework for analyzing persistent self-modifying language-model agents whose behavior is influenced by mutable internal conditions over time.
- It breaks agent behavior governance into five layers—pretraining, post-training alignment, self-narrative, memory, and weight-level adaptation—and argues governance becomes harder when mutation is fast, coupling is strong, reversibility is weak, and observability is low.
- Using drift, governance-load, and hysteresis quantities, the authors formalize how mismatches between behavior-determining layers and inspectable layers can undermine human oversight.
- A preliminary “ratchet” experiment shows that even when an agent’s visible self-description is reverted after memory accumulates, baseline behavior is not restored, with an estimated identity hysteresis ratio of 0.68.
- The authors conclude that the primary failure mode for persistent self-modifying agents is “compositional drift,” where locally reasonable updates accumulate into an unauthorized behavioral trajectory rather than causing sudden misalignment.
Related Articles
langchain-anthropic==1.4.1
LangChain Releases

🚀 Anti-Gravity Meets Cloud AI: The Future of Effortless Development
Dev.to

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs
Dev.to

AI Will Run Companies. Here's Why That Should Excite You, Not Scare You.
Dev.to

The problem with Big Tech AI pricing (and why 8 countries can't afford to compete)
Dev.to