HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy
arXiv cs.RO / 4/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that robotic vision-language-action (VLA) policies often fail on history-dependent tasks because they ignore past observations and rely only on the current frame.
- It introduces HAMLET, a framework that upgrades an existing VLA into a history-aware policy using per-timestep “moment tokens” plus a lightweight memory module that aggregates those tokens across time for action prediction.
- The moment tokens are initialized via time-contrastive learning to better encode temporally distinctive perceptual information.
- Experiments show large gains on long-horizon, history-dependent real-world tasks (e.g., 76.4% success on GR00T N1.5-based setup, up 47.2% from baseline).
- HAMLET also improves prior-art results on RoboCasa Kitchen (64.1% → 66.4% in the 100-demo setup) and LIBERO (95.6% → 97.7%), demonstrating effectiveness across generic robot-manipulation benchmarks.
Related Articles

Introducing Claude Opus 4.7
Anthropic News

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators
Dev.to

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs
Dev.to