EgoSpot:Egocentric Multimodal Control for Hands-Free Mobile Manipulation
arXiv cs.RO / 3/23/2026
💬 OpinionSignals & Early TrendsModels & Research
Key Points
- The paper introduces EgoSpot, a hands-free multimodal control framework for the Boston Dynamics Spot robot using a Microsoft HoloLens 2 mixed-reality headset.
- It targets accessibility needs by replacing manual joystick/handheld controller inputs with egocentric signals such as eye gaze, head gestures, and voice commands.
- The system fuses multiple input modalities to enable real-time control of both Spot’s locomotion and its arm manipulation.
- Experiments report task completion time and user experience performance comparable to traditional joystick-based control while improving interaction accessibility and naturalness.
- The authors position egocentric multimodal interfaces as a path toward more inclusive mobile manipulation robots, with a live demonstration available via their project webpage.
Related Articles

Interactive Web Visualization of GPT-2
Reddit r/artificial
Stop Treating AI Interview Fraud Like a Proctoring Problem
Dev.to
[R] Causal self-attention as a probabilistic model over embeddings
Reddit r/MachineLearning
The 5 software development trends that actually matter in 2026 (and what they mean for your startup)
Dev.to
Zuckerberg Built an AI CEO. Now Someone Has to Do the Work It Delegates.
Dev.to