Self-Discovered Intention-aware Transformer for Multi-modal Vehicle Trajectory Prediction

arXiv cs.RO / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a pure Transformer-based, intention-aware multimodal model for vehicle trajectory prediction that jointly considers neighboring vehicles without relying on fixed graph structures or explicit intention labels.
  • It uses a two-track architecture: one track generates future trajectory distributions, while the other predicts the likelihood of different intentions for each scenario.
  • The authors report that separating the spatial reasoning component from the trajectory-generation component improves overall predictive performance.
  • The model is designed to learn an ordered set of candidate future trajectories by predicting residual offsets among K trajectory hypotheses.

Abstract

Predicting vehicle trajectories plays an important role in autonomous driving and ITS applications. Although multiple deep learning algorithms are devised to predict vehicle trajectories, their reliant on specific graph structure (e.g., Graph Neural Network) or explicit intention labeling limit their flexibilities. In this study, we propose a pure Transformer-based network with multiple modals considering their neighboring vehicles. Two separate tracks are employed. One track focuses on predicting the trajectories while the other focuses on predicting the likelihood of each intention considering neighboring vehicles. Study finds that the two track design can increase the performance by separating spatial module from the trajectory generating module. Also, we find the the model can learn an ordered group of trajectories by predicting residual offsets among K trajectories.