LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video

arXiv cs.CV / 4/9/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • LiveStre4m targets real-time novel view synthesis (NVS) from unposed sparse multi-view video, addressing the challenge that prior dynamic-scene methods need ground-truth camera parameters and slow optimization.
  • The system uses a multi-view vision transformer for keyframe 3D scene reconstruction, paired with a diffusion-transformer interpolation module to maintain temporal consistency for streaming.
  • A Camera Pose Predictor estimates both camera poses and intrinsics directly from RGB images, eliminating dependence on known calibration.
  • The method achieves ~0.07 seconds per frame at 1024×768 resolution and can run with as few as two synchronized unposed input streams, outperforming optimization-based approaches by orders of magnitude in runtime.
  • The paper releases code via GitHub, aiming to make deployable live novel-view synthesis systems more practical.

Abstract

Live-streaming Novel View Synthesis (NVS) from unposed multi-view video remains an open challenge in a wide range of applications. Existing methods for dynamic scene representation typically require ground-truth camera parameters and involve lengthy optimizations (\approx 2.67s), which makes them unsuitable for live streaming scenarios. To address this issue, we propose a novel viewpoint video live-streaming method (LiveStre4m), a feed-forward model for real-time NVS from unposed sparse multi-view inputs. LiveStre4m introduces a multi-view vision transformer for keyframe 3D scene reconstruction coupled with a diffusion-transformer interpolation module that ensures temporal consistency and stable streaming. In addition, a Camera Pose Predictor module is proposed to efficiently estimate both poses and intrinsics directly from RGB images, removing the reliance on known camera calibration information. Our approach enables temporally consistent novel-view video streaming in real-time using as few as two synchronized unposed input streams. LiveStre4m attains an average reconstruction time of 0.07s per-frame at 1024 \times 768 resolution, outperforming the optimization-based dynamic scene representation methods by orders of magnitude in runtime. These results demonstrate that LiveStre4m makes real-time NVS streaming feasible in practical settings, marking a substantial step toward deployable live novel-view synthesis systems. Code available at: https://github.com/pedro-quesado/LiveStre4m