RetroMotion: Retrocausal Motion Forecasting Models are Instructable

arXiv cs.CV / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

RetroMotion introduces retrocausal motion forecasting models that transfer information from later points in marginal trajectories to earlier points in joint trajectories to better model multi-agent interactions.
The approach reduces the exponential growth of the joint trajectory output space by decomposing forecasting into marginal distributions for each agent and joint distributions only for interacting pairs.
It uses a transformer pipeline that re-encodes marginal distributions and then performs pairwise joint modeling to generate the final joint trajectory distributions.
For uncertainty at each time step, RetroMotion models positional uncertainty with compressed exponential power distributions.
The method performs strongly on the Waymo Interaction Prediction Challenge, generalizes to Argoverse 2 and V2X-Seq, and includes an instruction interface where instruction-following can be learned from standard training.

Abstract

Motion forecasts of road users (i.e., agents) vary in complexity depending on the number of agents, scene constraints, and interactions. In particular, the output space of joint trajectory distributions grows exponentially with the number of agents. Therefore, we decompose multi-agent motion forecasts into (1) marginal distributions for all modeled agents and (2) joint distributions for interacting agents. Using a transformer model, we generate joint distributions by re-encoding marginal distributions followed by pairwise modeling. This incorporates a retrocausal flow of information from later points in marginal trajectories to earlier points in joint trajectories. For each time step, we model the positional uncertainty using compressed exponential power distributions. Notably, our method achieves strong results in the Waymo Interaction Prediction Challenge and generalizes well to the Argoverse 2 and V2X-Seq datasets. Additionally, our method provides an interface for issuing instructions. We show that standard motion forecasting training implicitly enables the model to follow instructions and adapt them to the scene context. GitHub repository: https://github.com/kit-mrt/future-motion