Towards Generalizable Robotic Manipulation in Dynamic Environments
arXiv cs.RO / 4/16/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- VLA models are found to perform poorly on robotic manipulation tasks in dynamic environments, largely due to limited dynamic manipulation datasets and their dependence on single-frame observations that weakens spatiotemporal reasoning.
- The paper introduces DOMINO, a large-scale dataset and benchmark with 35 tasks, hierarchical difficulty, 110K+ expert trajectories, and a multi-dimensional evaluation suite to study generalizable dynamic manipulation.
- It evaluates existing VLA systems on dynamic tasks, tests training strategies for improving dynamic awareness, and shows that training on dynamic data can also improve transfer to static manipulation.
- The authors propose PUMA, a dynamics-aware VLA architecture that uses scene-centric historical optical flow plus world queries for implicit short-horizon prediction of object-centric future states.
- PUMA achieves state-of-the-art results, improving success rate by 6.3% absolute over baselines, and the authors release code and data via GitHub.
Related Articles

Black Hat Asia
AI Business

oh-my-agent is Now Official on Homebrew-core: A New Milestone for Multi-Agent Orchestration
Dev.to

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"
Dev.to

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to