AD4AD: Benchmarking Visual Anomaly Detection Models for Safer Autonomous Driving

arXiv cs.CV / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that autonomous-driving vision systems can fail when real-world conditions differ from the training data distribution, creating direct physical safety risks.
  • It proposes Visual Anomaly Detection (VAD) to detect unfamiliar objects not seen during training and to alert drivers with pixel-level anomaly maps.
  • The authors benchmark eight state-of-the-art VAD methods on AnoVox, a large synthetic anomaly-detection dataset for autonomous driving.
  • They evaluate multiple backbone architectures, from large models to lightweight options like MobileNet and DeiT-Tiny, showing VAD can transfer effectively to road scenes.
  • Tiny-Dinomaly is highlighted as achieving the best accuracy-efficiency trade-off for edge deployment, providing near full-scale localization quality with far lower memory cost.

Abstract

The reliability of a machine vision system for autonomous driving depends heavily on its training data distribution. When a vehicle encounters significantly different conditions, such as atypical obstacles, its perceptual capabilities can degrade substantially. Unlike many domains where errors carry limited consequences, failures in autonomous driving translate directly into physical risk for passengers, pedestrians, and other road users. To address this challenge, we explore Visual Anomaly Detection (VAD) as a solution. VAD enables the identification of anomalous objects not present during training, allowing the system to alert the driver when an unfamiliar situation is detected. Crucially, VAD models produce pixel-level anomaly maps that can guide driver attention to specific regions of concern without requiring any prior assumptions about the nature or form of the hazard. We benchmark eight state-of-the-art VAD methods on AnoVox, the largest synthetic dataset for anomaly detection in autonomous driving. In particular, we evaluate performance across four backbone architectures spanning from large networks to lightweight ones such as MobileNet and DeiT-Tiny. Our results demonstrate that VAD transfers effectively to road scenes. Notably, Tiny-Dinomaly achieves the best accuracy-efficiency trade-off for edge deployment, matching full-scale localization performance at a fraction of the memory cost. This study represents a concrete step toward safer, more responsible deployment of autonomous vehicles, ultimately improving protection for passengers, pedestrians, and all road users.