Meta-Reinforcement Learning with Self-Reflection for Agentic Search
arXiv cs.LG / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- MR-Search introduces an in-context meta reinforcement learning framework for agentic search that can adapt its strategy across episodes by conditioning on past experiences.
- The approach leverages explicit self-reflections after each episode to generate additional context that guides subsequent search attempts, enabling improved in-context exploration at test time.
- A novel multi-turn RL algorithm is proposed to estimate a dense relative advantage at the turn level, allowing fine-grained credit assignment across episodes.
- Empirical results show 9.2% to 19.3% performance gains over baselines across eight benchmarks, with strong generalization, and the authors release code and data at the linked GitHub repository.




