Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

arXiv cs.RO / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes Referring-Aware Visuomotor Policy (ReV) to improve robotic manipulation robustness under out-of-distribution errors and dynamic trajectory re-routing, using only original expert demonstrations for training.
ReV enables real-time closed-loop replanning by letting humans (or planners) provide sparse referring points that steer trajectories without requiring dense additional annotations.
The method uses coupled diffusion heads: a global head generates temporally sparse action anchors and locates where the referring point fits in that sequence, while a local head interpolates between anchors based on the referring point’s temporal position.
Training is done by applying targeted perturbations to expert demonstrations, and the authors report higher success rates on challenging simulated and real-world tasks without extra data collection or fine-tuning.

Abstract

This paper addresses a fundamental problem of visuomotor policy learning for robotic manipulation: how to enhance robustness in out-of-distribution execution errors or dynamically re-routing trajectories, where the model relies solely on the original expert demonstrations for training. We introduce the Referring-Aware Visuomotor Policy (ReV), a closed-loop framework that can adapt to unforeseen circumstances by instantly incorporating sparse referring points provided by a human or a high-level reasoning planner. Specifically, ReV leverages the coupled diffusion heads to preserve standard task execution patterns while seamlessly integrating sparse referring via a trajectory-steering strategy. Upon receiving a specific referring point, the global diffusion head firstly generates a sequence of globally consistent yet temporally sparse action anchors, while identifies the precise temporal position for the referring point within this sequence. Subsequently, the local diffusion head adaptively interpolates adjacent anchors based on the current temporal position for specific tasks. This closed-loop process repeats at every execution step, enabling real-time trajectory replanning in response to dynamic changes in the scene. In practice, rather than relying on elaborate annotations, ReV is trained only by applying targeted perturbations to expert demonstrations. Without any additional data or fine-tuning scheme, ReV achieve higher success rates across challenging simulated and real-world tasks.

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

Reddit r/MachineLearning

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Hugging Face Blog

Context Windows Are Getting Absurd — And That's a Good Thing

Dev.to

Google isn’t an AI-first company despite Gemini being great

Reddit r/artificial

GitHub Weekly: Copilot SDK Goes Public, Cloud Agent Breaks Free

Dev.to

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

Key Points

Abstract

Related Articles

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Context Windows Are Getting Absurd — And That's a Good Thing

Google isn’t an AI-first company despite Gemini being great

GitHub Weekly: Copilot SDK Goes Public, Cloud Agent Breaks Free

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer