ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards
arXiv cs.CV / 4/23/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces ProMMSearchAgent, a multimodal search agent trained with process-oriented rewards to tackle sparse supervision and the unpredictability of live web environments.
- It uses a Sim-to-Real training setup by decoupling policy learning into a deterministic local static sandbox, improving stability compared with training directly on the live web.
- The approach adds an introspective, process-based reward that probes the agent’s knowledge limits to generate dense guidance on when to choose correct cognitive actions and when to initiate multimodal or text search.
- Experiments show zero-shot transfer to the live Google Search API and report new state-of-the-art results, outperforming MMSearch-R1 across multiple benchmarks.
- Reported gains include +5.1% on FVQA-test, +6.3% on InfoSeek, and +11.3% on MMSearch, indicating strong generalization for knowledge-intensive visual reasoning.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Why use an AI gateway at all?
Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to