Gated Memory Policy

arXiv cs.AI / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Robotic manipulation tasks can be Markovian or non-Markovian, and naively extending observation history can cause large performance drops due to distribution shift and overfitting.
  • The proposed Gated Memory Policy (GMP) learns both when to recall historical context (via a learned memory gate) and what information to store and retrieve (via a lightweight cross-attention module).
  • GMP improves robustness by adding diffusion noise to historical actions to reduce sensitivity to noisy or inaccurate past histories during both training and inference.
  • On the non-Markovian MemMimic benchmark, GMP reports a 30.1% average success-rate improvement over long-history baselines, while still performing competitively on Markovian tasks in RoboMimic.
  • The authors provide code, data, and deployment instructions via the project website.

Abstract

Robotic manipulation tasks exhibit varying memory requirements, ranging from Markovian tasks that require no memory to non-Markovian tasks that depend on historical information spanning single or multiple interaction trials. Surprisingly, simply extending observation histories of a visuomotor policy often leads to a significant performance drop due to distribution shift and overfitting. To address these issues, we propose Gated Memory Policy (GMP), a visuomotor policy that learns both when to recall memory and what to recall. To learn when to recall memory, GMP employs a learned memory gate mechanism that selectively activates history context only when necessary, improving robustness and reactivity. To learn what to recall efficiently, GMP introduces a lightweight cross-attention module that constructs effective latent memory representations. To further enhance robustness, GMP injects diffusion noise into historical actions, mitigating sensitivity to noisy or inaccurate histories during both training and inference. On our proposed non-Markovian benchmark MemMimic, GMP achieves a 30.1% average success rate improvement over long-history baselines, while maintaining competitive performance on Markovian tasks in RoboMimic. All code, data and in-the-wild deployment instructions are available on our project website https://gated-memory-policy.github.io/.