Learning from the Unseen: Generative Data Augmentation for Geometric-Semantic Accident Anticipation

arXiv cs.CV / 5/4/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper targets accident anticipation in autonomous driving, highlighting that modeling road-user interactions is difficult and that large, diverse datasets are scarce.
  • It introduces a dual-path framework combining prompt-guided video synthesis for generating realistic synthetic scenes with statistical distributions learned from existing data.
  • It also proposes a semantic-cue-enhanced graph neural network that reasons dynamically over spatial and semantic relationships among participants.
  • The authors release a new benchmark dataset with standardized, finely annotated video sequences spanning varied regions, weather, and traffic conditions.
  • Experiments on existing datasets and the new benchmark show improved accuracy and earlier anticipation, suggesting the approach alleviates current data bottlenecks and improves reliability.

Abstract

Anticipating traffic accidents is a critical yet unresolved problem for autonomous driving, hindered by the inherent complexity of modeling interactions between road users and the limited availability of diverse, large-scale datasets. To address these issues, we propose a dual-path framework. On the one hand, we employ a video synthesis pipeline that, guided by structured prompts, derives feature distributions from existing corpora and produces high-fidelity synthetic driving scenes consistent with the statistical patterns of real data. On the other hand, we design a graph neural network enriched with semantic cues, enabling dynamic reasoning over both spatial and semantic relations among participants. To validate the effectiveness of our approach, we release a new benchmark dataset containing standardized, finely annotated video sequences that cover a broad spectrum of regions, weather, and traffic conditions. Evaluations across existing datasets and our new benchmark confirm notable gains in both accuracy and anticipation lead time, highlighting the capacity of the proposed framework to mitigate current data bottlenecks and enhance the reliability of autonomous driving systems.