Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking

arXiv cs.CV / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper addresses camera-only 3D object detection and tracking in autonomous driving, where depth ambiguity limits precise 3D localization without expensive online LiDAR at inference.
It proposes DualViewMapDet, which retrieves static geometric priors from previously traversed environments by using point-cloud maps as an online cue during deployment.
The method uses a dual-space camera–map fusion strategy by integrating both perspective-view (PV) features and direct bird’s-eye view (BEV) map encoding, then fuses them in a shared metric space.
Experiments on nuScenes and Argoverse 2 show consistent improvements over strong camera-only baselines, with especially large gains in object localization, and ablations confirm the value of PV/BEV fusion and map coverage.
The authors release code and pre-trained models publicly to support replication and further research.

Abstract

Camera-based 3D object detection and tracking are central to autonomous driving, yet precise 3D object localization remains fundamentally constrained by depth ambiguity when no expensive, depth-rich online LiDAR is available at inference. In many deployments, however, vehicles repeatedly traverse the same environments, making static point cloud maps from prior traversals a practical source of geometric priors. We propose DualViewMapDet, a camera-only inference framework that retrieves such map priors online and leverages them to mitigate the absence of a LiDAR sensor during deployment. The key idea is a dual-space camera-map fusion strategy that avoids one-sided view conversion. Specifically, we (i) project the map into perspective view (PV) and encode multi-channel geometric cues to enrich image features and support BEV lifting, and (ii) encode the map directly in bird's-eye view (BEV) with a sparse voxel backbone and fuse it with lifted camera features in a shared metric space. Extensive evaluations on nuScenes and Argoverse 2 demonstrate consistent improvements over strong camera-only baselines, with particularly strong gains in object localization. Ablations further validate the contributions of PV/BEV fusion and prior-map coverage. We make the code and pre-trained models available at https://dualviewmapdet.cs.uni-freiburg.de .

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

An API testing tool built specifically for AI agent loops

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Automatic Error Recovery in AI Agent Networks

Dev.to

Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

An API testing tool built specifically for AI agent loops

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Automatic Error Recovery in AI Agent Networks

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer