AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models
arXiv cs.RO / 4/21/2026
📰 NewsModels & Research
Key Points
- AnchorRefine addresses a key limitation of many vision-language-action (VLA) policies that optimize global motion and local corrections under a single objective, letting large motions dominate learning.
- The proposed hierarchical framework factorizes action modeling into a trajectory anchor planner for coarse motion scaffolding and a residual refinement module for execution-level geometric and contact corrections.
- It also adds a decision-aware gripper refinement mechanism to better handle discrete, boundary-sensitive gripper control.
- Experiments on LIBERO, CALVIN, and real-robot tasks show consistent improvements across both regression-based and diffusion-based VLA backbones, with up to 7.8% gains in simulation success and 18% in real-world success.
Related Articles

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA
Where is Grok-2 Mini and Grok-3 (mini)?
Reddit r/LocalLLaMA