Robust Fusion of Object-Level V2X for Learned 3D Object Detection

arXiv cs.CV / 5/4/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper addresses limitations of onboard-only perception for automated driving by exploring how object-level V2X messages can complement onboard sensors in 3D object detection.
  • Using the nuScenes dataset, the authors emulate realistic cooperative awareness by converting ground-truth object-level messages into BEV inputs while injecting latency, localization errors, noise, and object dropout.
  • Fused into a BEVFusion-style detector, V2X can significantly improve detection (achieving an NDS of 0.80 in favorable settings), but models trained on idealized data can become fragile and overly dependent on V2X.
  • The authors propose a noise-aware training approach with explicit confidence encoding, which improves robustness and preserves performance gains even under severe V2X imperfections and low penetration rates.

Abstract

Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by line-of-sight and field-of-view constraints. These inherent limitations may cause onboard perception to fail under occlusions or poor visibility conditions. In parallel, cooperative awareness via vehicle-to-everything (V2X) communication is becoming increasingly available, enabling vehicles and infrastructure to share their own state as object-level information that complements onboard perception. In this work, we study how such V2X information can be integrated into 3D object detection and how robust the resulting system is to realistic V2X imperfections. Using the nuScenes dataset, we emulate object-level cooperative awareness messages from ground truth, injecting controlled noise and object dropout to mimic real-world conditions such as latency, localization errors, and low V2X penetration rates. We convert these messages into a dedicated bird's-eye view (BEV) input and fuse them into a BEVFusion-style detector. Our results demonstrate that while object-level cooperative information can substantially improve detection performance, achieving an NDS of 0.80 under favorable conditions, models trained on idealized data become fragile and over-reliant on V2X. Conversely, our proposed noise-aware training strategy, coupled with explicit confidence encoding, enhances robustness, maintaining performance gains even under severe noise and reduced V2X penetration.