SceneExpander: Expanding 3D Scenes with Free-Form Inserted Views

arXiv cs.CV / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SceneExpander, a user-centric method to extend an existing reconstructed 3D scene by inserting an additional synthesized view rather than only editing objects or transferring styles in place.
  • It targets a key real-world failure mode: inserted views are often 3D-misaligned with the original multi-view reconstruction, causing geometry shifts, hallucinated content, and view-dependent artifacts that break global consistency.
  • SceneExpander uses test-time adaptation on a parametric feed-forward 3D reconstruction model with two distillation signals—anchor distillation to stabilize the original geometry and inserted-view self-distillation to adapt latent geometry and appearance.
  • Experiments on ETH scenes and online data show improved scene expansion behavior and reconstruction quality specifically under conditions of inserted-view misalignment.

Abstract

World building with 3D scene representations is increasingly important for content creation, simulation, and interactive experiences, yet real workflows are inherently iterative: creators must repeatedly extend an existing scene under user control. Motivated by this research gap, we study 3D scene expansion in a user-centric workflow: starting from a real scene captured by multi-view images, we extend its coverage by inserting an additional view synthesized by a generative model. Unlike simple object editing or style transfer in a fixed scene, the inserted view is often 3D-misaligned with the original reconstruction, introducing geometry shifts, hallucinated content, or view-dependent artifacts that break global multi-view consistency. To address the challenge, we propose SceneExpander, which applies test-time adaptation to a parametric feed-forward 3D reconstruction model with two complementary distillation signals: anchor distillation stabilizes the original scene by distilling geometric cues from the captured views, while inserted-view self-distillation preserves observation-supported predictions yet adapts latent geometry and appearance to accommodate the misaligned inserted view. Experiments on ETH scenes and online data demonstrate improved expansion behavior and reconstruction quality under misalignment.