AI Navigate

FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving

arXiv cs.CV / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • FlowAD proposes an ego-scene interactive modeling paradigm that represents ego-scene interaction as scene flow relative to the ego-vehicle to account for ego-motion feedback in the learning process.
  • The framework constructs basic flow units via an ego-guided scene partition shaped by the ego-vehicle's forward direction and steering velocity, and then predicts spatial and temporal flow to model scene flow dynamics.
  • The approach enables task-aware enhancements across perception, end-to-end planning, and vision-language model analysis by leveraging learned spatio-temporal flow dynamics.
  • Experiments on nuScenes and Bench2Drive show FlowAD achieving a 19% reduction in collision rate over SparseDrive, FCP improvements of 1.39 frames (60%) on nuScenes, and a driving score of 51.77 on Bench2Drive.
  • The work notes that code, model, and configurations will be released, indicating future availability for replication and use.

Abstract

Effective environment modeling is the foundation for autonomous driving, underpinning tasks from perception to planning. However, current paradigms often inadequately consider the feedback of ego motion to the observation, which leads to an incomplete understanding of the driving process and consequently limits the planning capability. To address this issue, we introduce a novel ego-scene interactive modeling paradigm. Inspired by human recognition, the paradigm represents ego-scene interaction as the scene flow relative to the ego-vehicle. This conceptualization allows for modeling ego-motion feedback within a feature learning pattern, advantageously utilizing existing log-replay datasets rather than relying on scenario simulations. We specifically propose FlowAD, a general flow-based framework for autonomous driving. Within it, an ego-guided scene partition first constructs basic flow units to quantify scene flow. The ego-vehicle's forward direction and steering velocity directly shape the partition, which reflects ego motion. Then, based on flow units, spatial and temporal flow predictions are performed to model dynamics of scene flow, encompassing both spatial displacement and temporal variation. The final task-aware enhancement exploits learned spatio-temporal flow dynamics to benefit diverse tasks through object and region-level strategies. We also propose a novel Frames before Correct Planning (FCP) metric to assess the scene understanding capability. Experiments in both open and closed-loop evaluations demonstrate FlowAD's generality and effectiveness across perception, end-to-end planning, and VLM analysis. Notably, FlowAD reduces 19% collision rate over SparseDrive with FCP improvements of 1.39 frames (60%) on nuScenes, and achieves an impressive driving score of 51.77 on Bench2Drive, proving the superiority. Code, model, and configurations will be released here.