Hybrid Latents -- Geometry-Appearance-Aware Surfel Splatting

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper presents a hybrid radiance representation that combines Gaussian splatting with a hash-grid and adds per-Gaussian latent features to separate geometry from appearance more effectively than prior NeRF-style methods.
By explicitly biasing the optimizer toward low- vs. high-frequency components and using hard opacity falloffs, the method reduces the chance that high-frequency textures will mask or compensate for geometry errors.
It improves efficiency by pruning redundant Gaussians probabilistically and applying a sparsity-inducing BCE-based opacity loss to keep only a minimal set of primitives.
Experiments on synthetic and real-world datasets show better reconstruction fidelity than state-of-the-art Gaussian-based novel-view synthesis, while using about an order of magnitude fewer Gaussians.
Overall, the work aims to make 2D Gaussian scene reconstruction from multi-view images more accurate and computationally efficient through frequency-aware latent modeling and aggressive model compacting.

Abstract

We introduce a hybrid Gaussian-hash-grid radiance representation for reconstructing 2D Gaussian scene models from multi-view images. Similar to NeST splatting, our approach reduces the entanglement between geometry and appearance common in NeRF-based models, but adds per-Gaussian latent features alongside hash-grid features to bias the optimizer toward a separation of low- and high-frequency scene components. This explicit frequency-based decomposition reduces the tendency of high-frequency texture to compensate for geometric errors. Encouraging Gaussians with hard opacity falloffs further strengthens the separation between geometry and appearance, improving both geometry reconstruction and rendering efficiency. Finally, probabilistic pruning combined with a sparsity-inducing BCE opacity loss allows redundant Gaussians to be turned off, yielding a minimal set of Gaussians sufficient to represent the scene. Using both synthetic and real-world datasets, we compare against the state of the art in Gaussian-based novel-view synthesis and demonstrate superior reconstruction fidelity with an order of magnitude fewer primitives.