DirectFisheye-GS: Enabling Native Fisheye Input in Gaussian Splatting with Cross-View Joint Optimization

arXiv cs.CV / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes DirectFisheye-GS, a method that incorporates the fisheye camera model directly into 3D Gaussian Splatting (3DGS) to train from native fisheye images without undistortion preprocessing.
  • It explains why common undistortion-and-resampling pipelines hurt reconstruction quality, including loss from black borders and detail dilution that leads to blur and floating artifacts.
  • Even with correct fisheye modeling, the authors find residual “floaters” near image edges caused by increased peripheral distortion and 3DGS’s per-iteration random view selection that fails to capture cross-view correlations.
  • To fix this, they introduce a feature-overlap-driven cross-view joint optimization that enforces consistent geometric and photometric constraints across views, improving stability and fidelity.
  • DirectFisheye-GS is reported to match or exceed state-of-the-art results on public datasets, and the optimization idea is claimed to be applicable to pinhole-camera pipelines as well.

Abstract

3D Gaussian Splatting (3DGS) has enabled efficient 3D scene reconstruction from everyday images with real-time, high-fidelity rendering, greatly advancing VR/AR applications. Fisheye cameras, with their wider field of view (FOV), promise high-quality reconstructions from fewer inputs and have recently attracted much attention. However, since 3DGS relies on rasterization, most subsequent works involving fisheye camera inputs first undistort images before training, which introduces two problems: 1) Black borders at image edges cause information loss and negate the fisheye's large FOV advantage; 2) Undistortion's stretch-and-interpolate resampling spreads each pixel's value over a larger area, diluting detail density -- causes 3DGS overfitting these low-frequency zones, producing blur and floating artifacts. In this work, we integrate fisheye camera model into the original 3DGS framework, enabling native fisheye image input for training without preprocessing. Despite correct modeling, we observed that the reconstructed scenes still exhibit floaters at image edges: Distortion increases toward the periphery, and 3DGS's original per-iteration random-selecting-view optimization ignores the cross-view correlations of a Gaussian, leading to extreme shapes (e.g., oversized or elongated) that degrade reconstruction quality. To address this, we introduce a feature-overlap-driven cross-view joint optimization strategy that establishes consistent geometric and photometric constraints across views-a technique equally applicable to existing pinhole-camera-based pipelines. Our DirectFisheye-GS matches or surpasses state-of-the-art performance on public datasets.