FunRec: Reconstructing Functional 3D Scenes from Egocentric Interaction Videos

arXiv cs.CV / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • FunRec is a new research method that reconstructs functional 3D “digital twin” indoor scenes from egocentric RGB-D interaction videos without relying on controlled capture setups or CAD priors.
  • The approach automatically discovers articulated parts, estimates their kinematic parameters, tracks 3D motion over time, and reconstructs both static and moving geometry in a canonical space suitable for simulation.
  • On newly introduced real and simulated benchmarks, FunRec reports large gains over prior articulated reconstruction methods, including up to +50 mIoU for part segmentation and substantially lower articulation and pose errors.
  • The paper demonstrates downstream usability via exports to simulation formats (URDF/USD) and interactive applications like affordance mapping and robot-scene interaction.

Abstract

We present FunRec, a method for reconstructing functional 3D digital twins of indoor scenes directly from egocentric RGB-D interaction videos. Unlike existing methods on articulated reconstruction, which rely on controlled setups, multi-state captures, or CAD priors, FunRec operates directly on in-the-wild human interaction sequences to recover interactable 3D scenes. It automatically discovers articulated parts, estimates their kinematic parameters, tracks their 3D motion, and reconstructs static and moving geometry in canonical space, yielding simulation-compatible meshes. Across new real and simulated benchmarks, FunRec surpasses prior work by a large margin, achieving up to +50 mIoU improvement in part segmentation, 5-10 times lower articulation and pose errors, and significantly higher reconstruction accuracy. We further demonstrate applications on URDF/USD export for simulation, hand-guided affordance mapping and robot-scene interaction.