Learning to Forget -- Hierarchical Episodic Memory for Lifelong Robot Deployment

arXiv cs.RO / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes H$^2$-EMV, a hierarchical episodic memory framework for lifelong humanoid robot deployment that learns what information to retain or forget.
  • It addresses the scalability problem of continuous multimodal memory by using language-model-based relevance estimation and user interaction to drive selective forgetting.
  • The system incrementally builds a hierarchical episodic memory structure and refines learned natural-language rules when users provide feedback about missing/forgotten details.
  • Experiments on simulated household tasks and 20.5 hours of real ARMAR-7 recordings show improved question-answering accuracy while cutting memory footprint by 45% and reducing query-time compute by 35%.
  • Performance improves through interaction, with a reported 70% accuracy gain on second-round queries by adapting to user-specific priorities, enabling more personalized long-term human-robot collaboration.

Abstract

Robots must verbalize their past experiences when users ask "Where did you put my keys?" or "Why did the task fail?" Yet maintaining life-long episodic memory (EM) from continuous multimodal perception quickly exceeds storage limits and makes real-time query impractical, calling for selective forgetting that adapts to users' notions of relevance. We present H^2-EMV, a framework enabling humanoids to learn what to remember through user interaction. Our approach incrementally constructs hierarchical EM, selectively forgets using language-model-based relevance estimation conditioned on learned natural-language rules, and updates these rules given user feedback about forgotten details. Evaluations on simulated household tasks and 20.5-hour-long real-world recordings from ARMAR-7 demonstrate that H^2-EMV maintains question-answering accuracy while reducing memory size by 45% and query-time compute by 35%. Critically, performance improves over time - accuracy increases 70% in second-round queries by adapting to user-specific priorities - demonstrating that learned forgetting enables scalable, personalized EM for long-term human-robot collaboration.