LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video

arXiv cs.CV / 4/9/2026

💬 OpinionSignals & Early TrendsModels & Research

共有:

Key Points

LiveStre4m targets real-time novel view synthesis (NVS) from unposed sparse multi-view video, addressing the challenge that prior dynamic-scene methods need ground-truth camera parameters and slow optimization.
The system uses a multi-view vision transformer for keyframe 3D scene reconstruction, paired with a diffusion-transformer interpolation module to maintain temporal consistency for streaming.
A Camera Pose Predictor estimates both camera poses and intrinsics directly from RGB images, eliminating dependence on known calibration.
The method achieves ~0.07 seconds per frame at 1024×768 resolution and can run with as few as two synchronized unposed input streams, outperforming optimization-based approaches by orders of magnitude in runtime.
The paper releases code via GitHub, aiming to make deployable live novel-view synthesis systems more practical.

Abstract

Live-streaming Novel View Synthesis (NVS) from unposed multi-view video remains an open challenge in a wide range of applications. Existing methods for dynamic scene representation typically require ground-truth camera parameters and involve lengthy optimizations (

\approx 2.67

s), which makes them unsuitable for live streaming scenarios. To address this issue, we propose a novel viewpoint video live-streaming method (LiveStre4m), a feed-forward model for real-time NVS from unposed sparse multi-view inputs. LiveStre4m introduces a multi-view vision transformer for keyframe 3D scene reconstruction coupled with a diffusion-transformer interpolation module that ensures temporal consistency and stable streaming. In addition, a Camera Pose Predictor module is proposed to efficiently estimate both poses and intrinsics directly from RGB images, removing the reliance on known camera calibration information. Our approach enables temporally consistent novel-view video streaming in real-time using as few as two synchronized unposed input streams. LiveStre4m attains an average reconstruction time of

0.07

s per-frame at

1024 \times 768

resolution, outperforming the optimization-based dynamic scene representation methods by orders of magnitude in runtime. These results demonstrate that LiveStre4m makes real-time NVS streaming feasible in practical settings, marking a substantial step toward deployable live novel-view synthesis systems. Code available at: https://github.com/pedro-quesado/LiveStre4m

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/9DailyView insight →

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Frontend Engineers Are Becoming AI Trainers

Dev.to

LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Frontend Engineers Are Becoming AI Trainers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer