TIGFlow-GRPO: Trajectory Forecasting via Interaction-Aware Flow Matching and Reward-Driven Optimization

arXiv cs.AI / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces TIGFlow-GRPO, a two-stage framework for human trajectory forecasting that explicitly aligns generated trajectories with behavioral rules and scene constraints rather than relying mainly on supervised fitting.
In the first stage, it builds a Conditional Flow Matching (CFM) predictor enhanced with a Trajectory-Interaction-Graph (TIG) module to better encode agent–agent and agent–scene interactions from spatio-temporal observations.
In the second stage, it applies a Flow-GRPO post-training approach by converting deterministic flow rollout into stochastic ODE-to-SDE sampling to encourage exploration of multimodal futures.
Training uses a composite reward combining view-aware social compliance and map-aware physical feasibility, with GRPO progressively steering predictions toward behaviorally plausible outcomes.
Experiments on ETH/UCY and SDD demonstrate improved forecasting accuracy, more stable long-horizon behavior, and trajectories that are both socially compliant and physically feasible.

Abstract

Human trajectory forecasting is important for intelligent multimedia systems operating in visually complex environments, such as autonomous driving and crowd surveillance. Although Conditional Flow Matching (CFM) has shown strong ability in modeling trajectory distributions from spatio-temporal observations, existing approaches still focus primarily on supervised fitting, which may leave social norms and scene constraints insufficiently reflected in generated trajectories. To address this issue, we propose TIGFlow-GRPO, a two-stage generative framework that aligns flow-based trajectory generation with behavioral rules. In the first stage, we build a CFM-based predictor with a Trajectory-Interaction-Graph (TIG) module to model fine-grained visual-spatial interactions and strengthen context encoding. This stage captures both agent-agent and agent-scene relations more effectively, providing more informative conditional features for subsequent alignment. In the second stage, we perform Flow-GRPO post-training,where deterministic flow rollout is reformulated as stochastic ODE-to-SDE sampling to enable trajectory exploration, and a composite reward combines view-aware social compliance with map-aware physical feasibility. By evaluating trajectories explored through SDE rollout, GRPO progressively steers multimodal predictions toward behaviorally plausible futures. Experiments on the ETH/UCY and SDD datasets show that TIGFlow-GRPO improves forecasting accuracy and long-horizon stability while generating trajectories that are more socially compliant and physically feasible. These results suggest that the proposed framework provides an effective way to connect flow-based trajectory modeling with behavior-aware alignment in dynamic multimedia environments.