An End-to-End Decision-Aware Multi-Scale Attention-Based Model for Explainable Autonomous Driving

arXiv cs.CV / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that deep learning’s “black-box” nature limits trustworthy deployment in fully automated driving, especially for understanding decision-making and anticipating failures.
  • It proposes an end-to-end, multi-scale attention-based model that feeds driving decisions into the reasoning component to generate decision-specific, case-based explanations.
  • For evaluation, the authors use the standard F1-score and introduce a new “Joint F1 score” metric aimed at measuring accurate and reliable Explainable AI (XAI) performance.
  • The approach is tested on BDD-OIA and further validated on the nu-AR dataset to assess generalization and robustness, with results indicating improved reasoning performance versus classic and state-of-the-art methods.
  • The overall contribution is a more dependable framework for interpreting autonomous driving models, intended to support safer real-world adoption of explainable systems.

Abstract

The application of computer vision is gradually increasing across various domains. They employ deep learning models with a black-box nature. Without the ability to explain the behavior of neural networks, especially their decision-making processes, it is not possible to recognize their efficiency, predict system failures, or effectively implement them in real-world applications. Due to the inevitable use of deep learning in fully automated driving systems, many methods have been proposed to explain their behavior; however, they suffer from flawed reasoning and unreliable metrics, which have prevented a comprehensive understanding of complex models in autonomous vehicles and hindered the development of truly reliable systems. In this study, we propose a multi-scale attention-based model in which driving decisions are fed into the reasoning component to provide case-specific explanations for each decision simultaneously. For quantitative evaluation of our model's performance, we employ the F1-score metric, and also proposed a new metric called the Joint F1 score to demonstrate the accurate and reliable performance of the model in terms of Explainable Artificial Intelligence (XAI). In addition to the BDD-OIA dataset, the nu-AR dataset is utilized to further validate the generalization capability and robustness of the proposed network. The results demonstrate the superiority of our reasoning network over the classic and state-of-the-art models.