BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

arXiv cs.RO / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • BifrostUMI is a new, robot-free data collection framework for training humanoid whole-body visuomotor policies, aiming to avoid the bottlenecks of robot teleoperation.
  • It uses lightweight VR to record human demonstrations as sparse keypoint trajectories while also capturing wrist-mounted visual data, producing multimodal training datasets.
  • The system trains a high-level policy network to predict future keypoint trajectories from the visual features, then retargets those trajectories onto the humanoid robot’s body morphology.
  • A keypoint retargeting pipeline and whole-body controller enable precise execution of agile behaviors learned from natural human demonstrations.
  • The authors report successful results in two different experimental scenarios, highlighting both effectiveness and versatility.

Abstract

High-quality data collection is a fundamental cornerstone for training humanoid whole-body visuomotor policies. Current data acquisition paradigms predominantly rely on robot teleoperation, which is often hindered by limited hardware accessibility and low operational efficiency. Inspired by the Universal Manipulation Interface (UMI), we propose BifrostUMI, a portable, efficient, and robot-free data collection framework tailored for humanoid robots. BifrostUMI leverages lightweight VR devices to capture human demonstrations as sparse keypoint trajectories while simultaneously recording wrist-mounted visual data. These multimodal data are subsequently utilized to train a high-level policy network that predicts future keypoint trajectories conditioned on the captured visual features. Through a robust keypoint retargeting pipeline, keypoint trajectories are precisely mapped onto the robot's morphology and executed via a whole-body controller. This approach enables the seamless transfer of diverse and agile behaviors from natural human demonstrations to humanoid embodiments. We demonstrate the efficacy and versatility of the proposed framework across two distinct experimental scenarios.