FurnSet: Exploiting Repeats for 3D Scene Reconstruction

arXiv cs.CV / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces FurnSet, a framework for single-view 3D scene reconstruction that explicitly leverages repeated object instances found in real-world scenes.
It adds per-object CLS tokens and uses set-aware self-attention to group identical instances and aggregate complementary observations for joint reconstruction.
The method guides object reconstruction by combining scene-level and object-level conditioning, then optimizes the overall layout using object point clouds with both 3D and 2D projection losses.
Experiments on 3D-Future and 3D-Front show improved reconstruction quality, indicating that exploiting repetition can make 3D scene reconstruction more robust.

Abstract

Single-view 3D scene reconstruction involves inferring both object geometry and spatial layout. Existing methods typically reconstruct objects independently or rely on implicit scene context, failing to exploit the repeated instances commonly present in realworld scenes. We propose FurnSet, a framework that explicitly identifies and leverages repeated object instances to improve reconstruction. Our method introduces per-object CLS tokens and a set-aware self-attention mechanism that groups identical instances and aggregates complementary observations across them, enabling joint reconstruction. We further combine scene-level and object-level conditioning to guide object reconstruction, followed by layout optimization using object point clouds with 3D and 2D projection losses for scene alignment. Experiments on 3D-Future and 3D-Front demonstrate improved scene reconstruction quality, highlighting the effectiveness of exploiting repetition for robust 3D scene reconstruction.