MANSION: Multi-floor lANguage-to-3D Scene generatIOn for loNg-horizon tasks
arXiv cs.CV / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces MANSION, a language-driven framework that generates building-scale, multi-floor 3D environments for long-horizon robotic tasks.
- MansionWorld, a dataset with over 1,000 diverse buildings (from hospitals to offices), and a Task-Semantic Scene Editing Agent enabling open-vocabulary customization are released alongside the framework.
- The framework accounts for vertical structural constraints to create realistic, navigable buildings suitable for cross-floor planning and evaluation.
- Benchmark results show state-of-the-art agents degrade sharply in these settings, establishing MANSION as a critical testbed for next-generation spatial reasoning and planning.




