Adversarial Flow Matching for Imperceptible Attacks on End-to-End Autonomous Driving

arXiv cs.CV / 5/5/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that end-to-end autonomous driving (E2E AD) systems—whether monolithic VLA-style or modular—may share a vulnerability in their Transformer backbones, allowing visually imperceptible perturbations to trigger dangerous behaviors.
It introduces Adversarial Flow Matching (AFM), a gray-box adversarial attack method that generates adversarial examples efficiently in a single step using a neural average velocity field.
AFM is designed to produce attacks that are both effective and visually subtle by jointly perturbing the model’s generative latent space and its neural average velocity field.
Experiments show AFM strongly degrades performance of both VLA and modular AD agents across scenarios while achieving state-of-the-art visual imperceptibility compared with existing baselines.
The adversarial examples also transfer robustly across models, suggesting AFM approximates a black-box threat model while only requiring prior knowledge that the target includes a Transformer-based module.

Abstract

Autonomous driving (AD) is evolving towards end-to-end (E2E) frameworks through two primary paradigms: monolithic models exemplified by Vision-Language-Action (VLA), and specialized modular architectures. Despite their divergent designs, both paradigms increasingly rely on Transformer backbones for complex reasoning, potentially causing a shared vulnerability: visually imperceptible perturbations can manipulate E2E AD models into hazardous maneuvers by targeting the Transformer module. Most existing adversarial attack approaches against AD systems operate under white-box or black-box settings; yet, they typically necessitate full model transparency, or suffer from either prohibitive query latency or limited attack transferability. In this paper, we propose Adversarial Flow Matching (AFM), a novel gray-box attack framework that exploits Transformer structural vulnerabilities in E2E AD models. AFM enables efficient one-step generation of adversarial examples via a neural average velocity field. Additionally, the proposed technique yields effective and visually imperceptible attacks by synergistically perturbing the generative latent space and the neural average velocity field. Extensive experiments demonstrate that AFM achieves a superior trade-off between attack effectiveness and imperceptibility: it substantially degrades the performance of both VLA and modular AD agents across various scenarios compared to baselines, while maintaining state-of-the-art visual imperceptibility. Furthermore, adversarial examples generated by AFM exhibit robust cross-model transferability, indicating that AFM closely approximates a black-box attack setting while requiring only the prior knowledge that the target AD model incorporates a Transformer-based module.