EgoSpot:Egocentric Multimodal Control for Hands-Free Mobile Manipulation

arXiv cs.RO / 3/23/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The paper introduces EgoSpot, a hands-free multimodal control framework for the Boston Dynamics Spot robot using a Microsoft HoloLens 2 mixed-reality headset.
  • It targets accessibility needs by replacing manual joystick/handheld controller inputs with egocentric signals such as eye gaze, head gestures, and voice commands.
  • The system fuses multiple input modalities to enable real-time control of both Spot’s locomotion and its arm manipulation.
  • Experiments report task completion time and user experience performance comparable to traditional joystick-based control while improving interaction accessibility and naturalness.
  • The authors position egocentric multimodal interfaces as a path toward more inclusive mobile manipulation robots, with a live demonstration available via their project webpage.

Abstract

We propose a novel hands-free control framework for the Boston Dynamics Spot robot using the Microsoft HoloLens 2 mixed-reality headset. Enabling accessible robot control is critical for allowing individuals with physical disabilities to benefit from robotic assistance in daily activities, teleoperation, and remote interaction tasks. However, most existing robot control interfaces rely on manual input devices such as joysticks or handheld controllers, which can be difficult or impossible for users with limited motor capabilities. To address this limitation, we develop an intuitive multimodal control system that leverages egocentric sensing from a wearable device. Our system integrates multiple control signals, including eye gaze, head gestures, and voice commands, to enable hands-free interaction. These signals are fused to support real-time control of both robot locomotion and arm manipulation. Experimental results show that our approach achieves performance comparable to traditional joystick-based control in terms of task completion time and user experience, while significantly improving accessibility and naturalness of interaction. Our results highlight the potential of egocentric multimodal interfaces to make mobile manipulation robots more inclusive and usable for a broader population. A demonstration of the system is available on our project webpage.