Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation

arXiv cs.CV / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Stitch4D, a 4D urban reconstruction framework designed for cases where cameras are at multiple, spatially separated locations with little to no view overlap.
  • Stitch4D improves reconstruction by synthesizing intermediate “bridge” views to densify spatial constraints before jointly optimizing real and synthesized observations in a unified coordinate frame.
  • It includes explicit inter-location consistency constraints to reduce temporal artifacts and prevent geometric collapse that typically occurs when applying dense-view 4D methods to sparse data.
  • The authors also release a CARLA-based benchmark, Urban Sparse 4D (U-S4D), to evaluate spatiotemporal alignment under sparse multi-location configurations.
  • Experiments on U-S4D show Stitch4D outperforming representative 4D reconstruction baselines with better visual quality, highlighting the importance of recovering intermediate spatial coverage for stable 4D reconstruction.

Abstract

Dynamic urban environments are often captured by cameras placed at spatially separated locations with little or no view overlap. However, most existing 4D reconstruction methods assume densely overlapping views. When applied to such sparse observations, these methods fail to reconstruct intermediate regions and often introduce temporal artifacts. To address this practical yet underexplored sparse multi-location setting, we propose Stitch4D, a unified 4D reconstruction framework that explicitly compensates for missing spatial coverage in sparse observations. Stitch4D (i) synthesizes intermediate bridge views to densify spatial constraints and improve spatial coverage, and (ii) jointly optimizes real and synthesized observations within a unified coordinate frame under explicit inter-location consistency constraints. By restoring intermediate coverage before optimization, Stitch4D prevents geometric collapse and reconstructs coherent geometry and smooth scene dynamics even in sparsely observed environments. To evaluate this setting, we introduce Urban Sparse 4D (U-S4D), a CARLA-based benchmark designed to assess spatiotemporal alignment under sparse multi-location configurations. Experimental results on U-S4D show that Stitch4D surpasses representative 4D reconstruction baselines and achieves superior visual quality. These results indicate that recovering intermediate spatial coverage is essential for stable 4D reconstruction in sparse urban environments.