Failure Modes for Deep Learning-Based Online Mapping: How to Measure and Address Them

arXiv cs.CV / 3/23/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes failure modes in deep learning-based online mapping due to memorization of input features and overfitting to known map geometries, and proposes a framework to disentangle these effects.
  • It introduces Fréchet distance-based reconstruction statistics and complementary failure-mode scores to quantify localization overfitting and map-geometry overfitting without threshold tuning.
  • It analyzes dataset biases with a minimum-spanning-tree (MST) diversity measure and a symmetric coverage measure to quantify geometric similarity between data splits, and proposes an MST-based sparsification strategy to reduce redundancy and improve balance.
  • Empirical results on nuScenes and Argoverse 2 across multiple state-of-the-art models show that geometry-diverse, balanced training improves generalization and support failure-mode-aware dataset design for deployable online mapping.

Abstract

Deep learning-based online mapping has emerged as a cornerstone of autonomous driving, yet these models frequently fail to generalize beyond familiar environments. We propose a framework to identify and measure the underlying failure modes by disentangling two effects: Memorization of input features and overfitting to known map geometries. We propose measures based on evaluation subsets that control for geographical proximity and geometric similarity between training and validation scenes. We introduce Fr\'echet distance-based reconstruction statistics that capture per-element shape fidelity without threshold tuning, and define complementary failure-mode scores: a localization overfitting score quantifying the performance drop when geographic cues disappear, and a map geometry overfitting score measuring degradation as scenes become geometrically novel. Beyond models, we analyze dataset biases and contribute map geometry-aware diagnostics: A minimum-spanning-tree (MST) diversity measure for training sets and a symmetric coverage measure to quantify geometric similarity between splits. Leveraging these, we formulate an MST-based sparsification strategy that reduces redundancy and improves balancing and performance while shrinking training size. Experiments on nuScenes and Argoverse 2 across multiple state-of-the-art models yield more trustworthy assessment of generalization and show that map geometry-diverse and balanced training sets lead to improved performance. Our results motivate failure-mode-aware protocols and map geometry-centric dataset design for deployable online mapping.