Learning to Retrieve from Agent Trajectories
arXiv cs.AI / 4/8/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that traditional learning-to-rank retrieval models trained on human click/dwell logs do not match the way LLM-powered search agents query and consume results in multi-turn loops.
- It proposes a new training paradigm, learning to retrieve from agent trajectories, where supervision is extracted from multi-step agent interactions rather than human-centric signals.
- By analyzing search agent trajectories, the authors identify behavioral signals indicative of document utility, such as browsing actions, unbrowsed rejections, and reasoning traces after browsing.
- They introduce LRAT, a framework that mines high-quality retrieval supervision from agent trajectories and uses relevance-intensity via weighted optimization.
- Experiments across deep research benchmarks show that LRAT-trained retrievers improve evidence recall, end-to-end task success, and execution efficiency across different agent architectures and scales.
Related Articles

Meta's latest model is as open as Zuckerberg's private school
The Register

Why multi-agent AI security is broken (and the identity patterns that actually work)
Dev.to
BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.
Reddit r/artificial
A Comprehensive Implementation Guide to ModelScope for Model Search, Inference, Fine-Tuning, Evaluation, and Export
MarkTechPost

Harness Engineering: The Next Evolution of AI Engineering
Dev.to