Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

arXiv cs.CV / 5/1/2026

📰 NewsModels & Research

共有:

Key Points

The paper addresses the difficulty of sparse-view, unposed 3D reconstruction in real-world settings with changing illumination and transient occlusions, where prior methods often require per-scene optimization.
It introduces GenWildSplat, a feed-forward framework that predicts depth, camera parameters, and 3D Gaussians in a canonical space from unposed internet images without any test-time per-scene optimization.
GenWildSplat uses learned geometric priors, an appearance adapter to adjust appearance for target lighting, and semantic segmentation to manage transient objects.
The approach is trained via curriculum learning on both synthetic and real data to improve generalization across varied illumination and occlusion conditions.
Experiments on PhotoTourism and MegaScenes show state-of-the-art rendering quality with real-time inference speed, emphasizing strong generalization compared to scene-specific baselines.

Abstract

Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and fails under sparse views. Moreover, evaluations on limited scenes raise questions about generalization. We present GenWildSplat, a feed-forward framework for sparse-view outdoor reconstruction that requires no per-scene optimization. Given unposed internet images, GenWildSplat predicts depth, camera parameters, and 3D Gaussians in a canonical space using learned geometric priors. An appearance adapter modulates appearance for target lighting conditions, while semantic segmentation handles transient objects. Through curriculum learning on synthetic and real data, GenWildSplat generalizes across diverse illumination and occlusion patterns. Evaluations on PhotoTourism and MegaScenes benchmark demonstrate state-of-the-art feed-forward rendering quality, achieving real-time inference without test-time optimization

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

THE DECODER

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

The Register

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Reddit r/LocalLLaMA

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

Reddit r/MachineLearning

Generalizable Sparse-View 3D Reconstruction from Unconstrained Images

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer