Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
arXiv cs.CL / 5/4/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses how LLM agents can maintain long-term, evolving user memory despite limited context windows and the limitations of static or sparsely supervised RL-based memory update rules.
- It proposes MemCoE, a cognition-inspired two-stage optimization framework that separates learning (how to organize memory) from decision-making (what to update).
- In stage one, Memory Guideline Induction learns a global memory guideline using contrastive feedback treated as textual gradients.
- In stage two, Guideline-Aligned Memory Policy Optimization uses the learned guideline to craft structured process rewards and trains a multi-turn RL policy for guideline-following memory updates.
- Experiments on three personalization memory benchmarks show consistent gains over strong baselines, with improved robustness, transferability, and efficiency across preference types, memory sizes, and noise levels.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

You Are Right — You Don't Need CLAUDE.md
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to