Generalizable Sparse-View 3D Reconstruction from Unconstrained Images
arXiv cs.CV / 5/1/2026
📰 NewsModels & Research
Key Points
- The paper addresses the difficulty of sparse-view, unposed 3D reconstruction in real-world settings with changing illumination and transient occlusions, where prior methods often require per-scene optimization.
- It introduces GenWildSplat, a feed-forward framework that predicts depth, camera parameters, and 3D Gaussians in a canonical space from unposed internet images without any test-time per-scene optimization.
- GenWildSplat uses learned geometric priors, an appearance adapter to adjust appearance for target lighting, and semantic segmentation to manage transient objects.
- The approach is trained via curriculum learning on both synthetic and real data to improve generalization across varied illumination and occlusion conditions.
- Experiments on PhotoTourism and MegaScenes show state-of-the-art rendering quality with real-time inference speed, emphasizing strong generalization compared to scene-specific baselines.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’
The Register
Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats
Reddit r/LocalLLaMA
![Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]](/_next/image?url=https%3A%2F%2Fpreview.redd.it%2Fvutakjb0vgyg1.png%3Fwidth%3D140%26height%3D59%26auto%3Dwebp%26s%3D08ecb95fd65ade25c924988f1992e9abe3d79f62&w=3840&q=75)
Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]
Reddit r/MachineLearning