ViVa: A Video-Generative Value Model for Robot Reinforcement Learning
arXiv cs.RO / 4/10/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces ViVa, a video-generative value model designed for robot reinforcement learning to better estimate state values under partial observability and long-horizon tasks.
- ViVa takes the robot’s current observation plus proprioception, then predicts future proprioception and a scalar value jointly, using a pretrained video generator to inject spatiotemporal priors into value estimation.
- The approach targets a key limitation of prior VLM-based value models by capturing temporal dynamics rather than relying on static snapshot embeddings.
- Integrated into the RECAP framework, ViVa reportedly improves real-world box assembly performance and produces more reliable value signals that track task progress.
- Qualitative results suggest ViVa generalizes to novel objects across tasks, indicating that video-generative models may provide a promising foundation for value estimation in robotic settings.
Related Articles

GLM 5.1 tops the code arena rankings for open models
Reddit r/LocalLLaMA
can we talk about how AI has gotten really good at lying to you?
Reddit r/artificial

AI just found thousands of zero-days. Your firewall is still pattern-matching from 2014
Dev.to

Emergency Room and the Vanishing Moat
Dev.to

I Built a 100% Browser-Based OCR That Never Uploads Your Documents — Here's How
Dev.to