Attention-Augmented YOLOv8 with Ghost Convolution for Real-Time Vehicle Detection in Intelligent Transportation Systems

arXiv cs.CV / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper proposes an improved, real-time vehicle detection model based on YOLOv8n by combining the Ghost Module, CBAM (attention), and DCNv2 (deformable convolutions) to better handle clutter and geometric variation in traffic scenes.
  • The Ghost Module targets feature redundancy with efficient feature generation, while CBAM enhances representation quality using channel and spatial attention mechanisms.
  • DCNv2 is used to increase adaptability to differing vehicle shapes and structural deformations, aiming to improve robustness across complex environments.
  • On the KITTI dataset, the model reportedly reaches 95.4% mAP@0.5, which is an 8.97% improvement over the baseline YOLOv8n, alongside 96.2% precision, 93.7% recall, and a 94.93% F1-score.
  • Comparative experiments against seven state-of-the-art detectors and ablation studies indicate the integrated modules consistently improve performance, with each component contributing individually and together.

Abstract

Accurate vehicle detection is a critical component of autonomous driving, traffic surveillance, and intelligent transportation systems. This paper presents an enhanced YOLOv8n-based model that integrates the Ghost Module, Convolutional Block Attention Module (CBAM), and Deformable Convolutional Networks v2 (DCNv2) to improve detection performance. The Ghost Module reduces feature redundancy through efficient feature generation, CBAM refines feature representation via channel and spatial attention, and DCNv2 enhances adaptability to geometric variations in vehicle structures. Evaluated on the KITTI dataset, the proposed model achieves 95.4% mAP@0.5, representing an 8.97% improvement over the baseline YOLOv8n, along with 96.2% precision, 93.7% recall, and a 94.93% F1-score. Comparative analysis against seven state-of-the-art detectors demonstrates consistent superiority across key performance metrics, while ablation studies validate the individual and combined contributions of the integrated modules. By addressing feature redundancy, attention refinement, and spatial adaptability, the proposed approach offers a robust and computationally efficient solution for vehicle detection in diverse and complex traffic environments.