In Depth We Trust: Reliable Monocular Depth Supervision for Gaussian Splatting

arXiv cs.CV / 4/8/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a reliable way to use monocular depth priors to improve 3D Gaussian Splatting (GS) rendering, addressing issues like scale ambiguity, multi-view inconsistency, and local geometric errors from monocular depth models.
It introduces a training framework that incorporates scale-ambiguous and noisy depth priors into geometric supervision, emphasizing learning from weakly aligned depth variations.
The method includes an approach to identify ill-posed geometry so that monocular depth regularization is applied selectively, limiting the spread of depth inaccuracies into well-reconstructed 3D regions.
Experiments across multiple datasets report consistent gains in geometric accuracy and improved rendering quality across different GS variants and different monocular depth backbones.

Abstract

Using accurate depth priors in 3D Gaussian Splatting helps mitigate artifacts caused by sparse training data and textureless surfaces. However, acquiring accurate depth maps requires specialized acquisition systems. Foundation monocular depth estimation models offer a cost-effective alternative, but they suffer from scale ambiguity, multi-view inconsistency, and local geometric inaccuracies, which can degrade rendering performance when applied naively. This paper addresses the challenge of reliably leveraging monocular depth priors for Gaussian Splatting (GS) rendering enhancement. To this end, we introduce a training framework integrating scale-ambiguous and noisy depth priors into geometric supervision. We highlight the importance of learning from weakly aligned depth variations. We introduce a method to isolate ill-posed geometry for selective monocular depth regularization, restricting the propagation of depth inaccuracies into well-reconstructed 3D structures. Extensive experiments across diverse datasets show consistent improvements in geometric accuracy, leading to more faithful depth estimation and higher rendering quality across different GS variants and monocular depth backbones tested.