High-fidelity Multi-view Normal Integration with Scale-encoded Neural Surface Representation

arXiv cs.CV / 3/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a core limitation of existing multi-view normal integration: sampling only one ray per pixel ignores the pixel’s spatial coverage, which changes with camera intrinsics and object distance.
  • When the same object is captured from different distances, the resulting normal estimates across corresponding pixels can become inconsistent across views, causing blurred high-frequency surface details.
  • It proposes a scale-encoded neural surface representation that explicitly incorporates per-pixel coverage area by associating 3D points with a spatial scale and deriving normals via a hybrid grid-based encoding.
  • The method also adds a scale-aware mesh extraction module that assigns an optimal local scale to each mesh vertex based on training observations, improving reconstruction under varying capture distances.
  • Experiments show the approach produces consistently higher-fidelity reconstructions from normals observed at different distances and outperforms prior multi-view normal integration methods.

Abstract

Previous multi-view normal integration methods typically sample a single ray per pixel, without considering the spatial area covered by each pixel, which varies with camera intrinsics and the camera-to-object distance. Consequently, when the target object is captured at different distances, the normals at corresponding pixels may differ across views. This multi-view surface normal inconsistency results in the blurring of high-frequency details in the reconstructed surface. To address this issue, we propose a scale-encoded neural surface representation that incorporates the pixel coverage area into the neural representation. By associating each 3D point with a spatial scale and calculating its normal from a hybrid grid-based encoding, our method effectively represents multi-scale surface normals captured at varying distances. Furthermore, to enable scale-aware surface reconstruction, we introduce a mesh extraction module that assigns an optimal local scale to each vertex based on the training observations. Experimental results demonstrate that our approach consistently yields high-fidelity surface reconstruction from normals observed at varying distances, outperforming existing multi-view normal integration methods.