Beyond Where to Look: Trajectory-Guided Reinforcement Learning for Multimodal RLVR

arXiv cs.CV / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a key RLVR limitation in multimodal LLMs: models may attend to relevant visual regions but often do not properly use visual evidence during the reasoning process.
  • It proposes Trajectory-Guided Reinforcement Learning (TGRL), which uses expert reasoning trajectories from stronger models to steer the policy toward fine-grained, visually grounded reasoning.
  • The method includes token-level reweighting and trajectory filtering to stabilize and improve reinforcement learning optimization.
  • Experiments across multiple multimodal reasoning benchmarks show TGRL consistently improves reasoning performance and reduces the disconnect between visual perception and logical reasoning.

Abstract

Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) for multimodal large language models (MLLMs) have mainly focused on improving final answer correctness and strengthening visual grounding. However, a critical bottleneck remains: although models can attend to relevant visual regions, they often fail to effectively incorporate visual evidence into subsequent reasoning, leading to reasoning chains that are weakly grounded in visual facts. To address this issue, we propose Trajectory-Guided Reinforcement Learning (TGRL), which guides the policy model to integrate visual evidence into fine-grained reasoning processes using expert reasoning trajectories from stronger models. We further introduce token-level reweighting and trajectory filtering to ensure stable and effective policy optimization. Extensive experiments on multiple multimodal reasoning benchmarks demonstrate that TGRL consistently improves reasoning performance and effectively bridges the gap between visual perception and logical reasoning.

Beyond Where to Look: Trajectory-Guided Reinforcement Learning for Multimodal RLVR | AI Navigate