ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards

arXiv cs.CV / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ProMMSearchAgent, a multimodal search agent trained with process-oriented rewards to tackle sparse supervision and the unpredictability of live web environments.
  • It uses a Sim-to-Real training setup by decoupling policy learning into a deterministic local static sandbox, improving stability compared with training directly on the live web.
  • The approach adds an introspective, process-based reward that probes the agent’s knowledge limits to generate dense guidance on when to choose correct cognitive actions and when to initiate multimodal or text search.
  • Experiments show zero-shot transfer to the live Google Search API and report new state-of-the-art results, outperforming MMSearch-R1 across multiple benchmarks.
  • Reported gains include +5.1% on FVQA-test, +6.3% on InfoSeek, and +11.3% on MMSearch, indicating strong generalization for knowledge-intensive visual reasoning.

Abstract

Training multimodal agents via reinforcement learning for knowledge-intensive visual reasoning is fundamentally hindered by the extreme sparsity of outcome-based supervision and the unpredictability of live web environments. To resolve these algorithmic and environmental bottlenecks, we introduce ProMMSearchAgent, establishing a novel Sim-to-Real training paradigm for multimodal search. We decouple policy learning into a deterministic, local static sandbox. Crucially, to learn effectively within this constrained environment, we propose an introspective process-oriented reward. By probing the agent's own parametric knowledge boundaries, we generate dense behavioral metadata that explicitly rewards the correct cognitive decision, initiating a multimodal or text search only when visually or factually uncertain. Extensive experiments demonstrate that our locally-trained policy transfers zero-shot to the live Google Search API. ProMMSearchAgent achieves new SOTA performance, outperforming MMSearch-R1 by +5.1% on FVQA-test, +6.3% on InfoSeek, and +11.3% on MMSearch.