Fisheye3R: Adapting Unified 3D Feed-Forward Foundation Models to Fisheye Lenses

arXiv cs.CV / 4/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that feed-forward foundation models for multi-view 3D reconstruction degrade on fisheye (wide FOV) images because non-linear fisheye projection changes pixel spatial positions in ways the perspective-trained models were not exposed to.
It proposes Fisheye3R, an adaptation framework designed to extend existing multi-view 3D reconstruction foundation models to natively handle fisheye inputs while avoiding regression on perspective images.
To overcome limited fisheye data and scarce ground-truth supervision, the authors introduce flexible learning strategies that enable self-supervised adaptation using only unlabeled perspective images.
They also present a supervised adaptation mode that can improve fisheye performance without requiring any fisheye training data.
Experiments on three foundation models (VGGT, π^3, and MapAnything) show consistent gains in camera pose, depth, point maps, and field-of-view estimation for fisheye imagery.

Abstract

Feed-forward foundation models for multi-view 3-dimensional (3D) reconstruction have been trained on large-scale datasets of perspective images; when tested on wide field-of-view images, e.g., from a fisheye camera, their performance degrades. Their error arises from changes in spatial positions of pixels due to a non-linear projection model that maps 3D points onto the 2D image plane. While one may surmise that training on fisheye images would resolve this problem, there are far fewer fisheye images with ground truth than perspective images, which limit generalization. To enable inference on imagery exhibiting high radial distortion, we propose Fisheye3R, a novel adaptation framework that extends these multi-view 3D reconstruction foundation models to natively accommodate fisheye inputs without performance regression on perspective images. To address the scarcity of fisheye images and ground truth, we introduce flexible learning schemes that support self-supervised adaptation using only unlabeled perspective images and supervised adaptation without any fisheye training data. Extensive experiments across three foundation models, including VGGT,

\pi^3

, and MapAnything, demonstrate that our approach consistently improves camera pose, depth, point map, and field-of-view estimation on fisheye images.

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

Dev.to

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck

Dev.to

Agent Self-Discovery: How AI Agents Find Their Own Wallets

Dev.to

[P] Federated Adversarial Learning

Reddit r/MachineLearning

The Inversion Error: Why Safe AGI Requires an Enactive Floor and State-Space Reversibility

Towards Data Science

Fisheye3R: Adapting Unified 3D Feed-Forward Foundation Models to Fisheye Lenses

Key Points

Abstract

Related Articles

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck

Agent Self-Discovery: How AI Agents Find Their Own Wallets

[P] Federated Adversarial Learning

The Inversion Error: Why Safe AGI Requires an Enactive Floor and State-Space Reversibility

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer