Memory Over Maps: 3D Object Localization Without Reconstruction
arXiv cs.RO / 2026/3/24
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper proposes a “map-free” 3D object localization pipeline that avoids building global 3D scene representations (e.g., point clouds, voxel grids, or scene graphs).
- Instead of reconstruction, it stores a lightweight visual memory of posed RGB-D keyframes and, at query time, retrieves relevant views, re-ranks them with a vision-language model, and performs sparse on-demand 3D estimation via depth backprojection and multi-view fusion.
- The authors report preprocessing and storage improvements, claiming over two orders of magnitude faster scene indexing than reconstruction-based pipelines while using substantially less storage.
- They validate the approach on object-goal navigation tasks and find it performs strongly across multiple benchmarks without task-specific training, suggesting dense reconstruction may be unnecessary for object-centric robotic navigation.
- The work reframes object localization as retrieval and semantic re-ranking over image-based memory, leveraging vision-language reasoning to replace expensive 3D reconstruction steps.
