Benchmarking Multi-View BEV Object Detection with Mixed Pinhole and Fisheye Cameras

arXiv cs.RO / 3/31/2026

💬 OpinionSignals & Early TrendsModels & Research

共有:

Key Points

混在ピンホール/フィッシュアイ・カメラ環境でのBEV（Bird’s-Eye View）3D物体検出に対し、既存モデルが主にピンホール前提のためフィッシュアイ歪みで性能劣化する問題を取り上げています。
KITTI-360をnuScenes形式に変換して、フィッシュアイとピンホール画像を用いたマルチビューBEV検出の実データ・ベンチマークを新たに構築し、zero-shot評価向けの補正や微調整、歪みを考慮したVT（View Transformation）など複数の適応手法を体系的に比較しています。
MEIカメラモデルに基づく歪み認識VTモジュール（VTM）や放射座標表現などの改変を用い、BEVFormer・BEVDet・PETRの代表的BEVアーキテクチャ3種で評価しています。
結果として、投影（projection）を行わないアーキテクチャが、他のVTMよりフィッシュアイ歪みに対して本質的に頑健で有効であることを示しています。
コードを公開し、フィッシュアイ混在下でも頑健でコスト効率の高い3D知覚システム設計のための実践的ガイドラインを提供しています。

Abstract

Modern autonomous driving systems increasingly rely on mixed camera configurations with pinhole and fisheye cameras for full view perception. However, Bird's-Eye View (BEV) 3D object detection models are predominantly designed for pinhole cameras, leading to performance degradation under fisheye distortion. To bridge this gap, we introduce a multi-view BEV detection benchmark with mixed cameras by converting KITTI-360 into nuScenes format. Our study encompasses three adaptations: rectification for zero-shot evaluation and fine-tuning of nuScenes-trained models, distortion-aware view transformation modules (VTMs) via the MEI camera model, and polar coordinate representations to better align with radial distortion. We systematically evaluate three representative BEV architectures, BEVFormer, BEVDet and PETR, across these strategies. We demonstrate that projection-free architectures are inherently more robust and effective against fisheye distortion than other VTMs. This work establishes the first real-data 3D detection benchmark with fisheye and pinhole images and provides systematic adaptation and practical guidelines for designing robust and cost-effective 3D perception systems. The code is available at https://github.com/CesarLiu/FishBEVOD.git.

Black Hat Asia

AI Business

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

Dev.to

AI Citation Registries and Identity Persistence Across Records

Dev.to

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK

Dev.to

Benchmarking Multi-View BEV Object Detection with Mixed Pinhole and Fisheye Cameras

Key Points

Abstract

Related Articles

Black Hat Asia

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

AI Citation Registries and Identity Persistence Across Records

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer