COSEARCH: Joint Training of Reasoning and Document Ranking via Reinforcement Learning for Agentic Search
arXiv cs.AI / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Agentic search has improved using reinforcement learning, but prior work often leaves the document retrieval/ranking component fixed while optimizing only the reasoning agent.
- The paper reports that replacing a fixed retrieval system with an oracle can yield up to a +26.8% relative F1 gain across seven QA benchmarks, indicating retrieval is a major bottleneck.
- It proposes CoSearch, which jointly trains a multi-step reasoning agent and a generative document ranker using Group Relative Policy Optimization (GRPO).
- To make GRPO work for the ranker despite variable inputs across reasoning trajectories, the authors introduce a semantic grouping method that clusters sub-queries by token-level similarity without extra rollouts.
- Experiments on seven single-hop and multi-hop QA benchmarks show consistent improvements over strong baselines, and ablations confirm the contribution of each component, supporting joint training as a key ingredient for future search agents.
Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.
Reddit r/artificial

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs
The Register

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle
Dev.to

DEEPX and Hyundai Are Building Generative AI Robots
Dev.to

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline
Dev.to