VAGNet: Vision-based accident anticipation with global features

arXiv cs.CV / 4/13/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces VAGNet, a deep neural network for anticipating traffic accidents from dashcam video using global scene features instead of computationally expensive object-level features.
  • VAGNet combines transformer and graph modules and leverages the vision foundation model VideoMAE-V2 to extract global representations for real-time hazard prediction.
  • Experiments on four benchmark datasets (DAD, DoTA, DADA, Nexar) report improved average precision and mean time-to-accident over prior approaches.
  • The method is claimed to be more computationally efficient, making it more suitable for real-time deployment in advanced driver assistance and autonomous driving systems.

Abstract

Traffic accidents are a leading cause of fatalities and injuries across the globe. Therefore, the ability to anticipate hazardous situations in advance is essential. Automated accident anticipation enables timely intervention through driver alerts and collision avoidance maneuvers, forming a key component of advanced driver assistance systems. In autonomous driving, such predictive capabilities support proactive safety behaviors, such as initiating defensive driving and human takeover when required. Using dashcam video as input offers a cost-effective solution, but it is challenging due to the complexity of real-world driving scenes. Accident anticipation systems need to operate in real-time. However, current methods involve extracting features from each detected object, which is computationally intensive. We propose VAGNet, a deep neural network that learns to predict accidents from dash-cam video using global features of traffic scenes without requiring explicit object-level features. The network consists of transformer and graph modules, and we use the vision foundation model VideoMAE-V2 for global feature extraction. Experiments on four benchmark datasets (DAD, DoTA, DADA, and Nexar) show that our method anticipates accidents with higher average precision and mean time-to-accident while being computationally more efficient compared to existing methods.