Localization-Guided Foreground Augmentation in Autonomous Driving

arXiv cs.CV / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces Localization-Guided Foreground Augmentation (LG-FA) to improve autonomous driving perception in adverse visibility (rain, night, snow) where scene geometry becomes sparse or fragmented.
  • LG-FA is designed as a lightweight, plug-and-play inference module that augments foreground understanding by building a sparse global vector layer from per-frame BEV predictions.
  • It estimates the vehicle’s ego pose using class-constrained geometric alignment, which simultaneously improves localization accuracy and fills in missing local topology.
  • The augmented foreground is reprojected into a unified global frame to enhance per-frame predictions, yielding better geometric completeness and temporal stability in nuScenes experiments.
  • The authors report that LG-FA reduces localization error and produces globally consistent lane and topology reconstructions, and can be integrated into existing BEV-based perception systems without modifying the backbone.

Abstract

Autonomous driving systems often degrade under adverse visibility conditions-such as rain, nighttime, or snow-where online scene geometry (e.g., lane dividers, road boundaries, and pedestrian crossings) becomes sparse or fragmented. While high-definition (HD) maps can provide missing structural context, they are costly to construct and maintain at scale. We propose Localization-Guided Foreground Augmentation (LG-FA), a lightweight and plug-and-play inference module that enhances foreground perception by enriching geometric context online. LG-FA: (i) incrementally constructs a sparse global vector layer from per-frame Bird's-Eye View (BEV) predictions; (ii) estimates ego pose via class-constrained geometric alignment, jointly improving localization and completing missing local topology; and (iii) reprojects the augmented foreground into a unified global frame to improve per-frame predictions. Experiments on challenging nuScenes sequences demonstrate that LG-FA improves the geometric completeness and temporal stability of BEV representations, reduces localization error, and produces globally consistent lane and topology reconstructions. The module can be seamlessly integrated into existing BEV-based perception systems without backbone modification. By providing a reliable geometric context prior, LG-FA enhances temporal consistency and supplies stable structural support for downstream modules such as tracking and decision-making.