HELIOS: Hierarchical Exploration for Language-Grounded Interaction in Open Scenes
arXiv cs.RO / 3/30/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces HELIOS, a hierarchical scene representation and search objective for language-grounded mobile manipulation in novel, partially observed environments.
- HELIOS combines 2D navigation maps (semantic and occupancy) with actively built 3D Gaussian object representations, fusing multi-layer observations while enforcing multi-view detection consistency via a Dirichlet distribution.
- The planning problem is cast as search over the hierarchical representation, with an objective that balances frontier/uncertainty exploration against expected information gain to improve object detection semantic consistency.
- On the OVMM benchmark in the Habitat simulator, HELIOS achieves state-of-the-art performance, especially in large complex scenes with small target objects.
- The method is also demonstrated in a real office setting using a Spot robot, leveraging pretrained VLMs and avoiding task-specific training for language-specified pick-and-place.
Related Articles

Black Hat Asia
AI Business

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

The Billion Dollar Tax on AI Agents
Dev.to