Long-tail Internet photo reconstruction
arXiv cs.CV / 4/27/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper highlights a “long-tail” challenge in Internet photo-to-3D reconstruction: well-known landmarks have abundant, clean imagery and are easy to reconstruct, while most sites have sparse, noisy, uneven photos that break both classical and learned 3D methods.
- It argues that solving this regime is a key next frontier for 3D foundation models, where obtaining reliable ground-truth supervision from sparse scenes is difficult.
- The authors propose simulating ground-truth supervision by sampling sparse subsets from well-reconstructed Internet landmarks, creating training conditions that resemble long-tail camera distributions.
- They introduce MegaDepth-X, a large dataset of 3D reconstructions with clean, dense depth, along with a sampling strategy to form training image sets for extreme sparsity.
- Fine-tuning 3D foundation models using MegaDepth-X and the sampling approach improves robustness under extreme sparsity and helps in symmetric/repetitive scenes without losing performance on standard dense 3D benchmarks.




