LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces LiteResearcher, a scalable reinforcement-learning (RL) training framework aimed at improving LLM-based “deep research” agents.
  • It argues that prior agentic RL scaling is limited by two coupled issues: synthetic training data that doesn’t produce authentic real-world search behavior, and reliance on real-world search during training that causes instability and high cost.
  • LiteResearcher addresses this by creating a “lite” virtual world that imitates real-world search dynamics, enabling a continuously improving training recipe.
  • The framework allows a small (4B) search agent to outperform much larger models, achieving 71.3% on GAIA and 78.0% on Xbench, setting open-source SOTA results.
  • Overall, the work positions scalable RL training as a key enabler for practical and cost-effective deep research agents.

Abstract

Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-world search dependency during RL training introduces instability and prohibitive cost, which limits the scalability of Agentic RL. LiteResearcher is a training framework that makes Agentic RL scalable: by constructing a lite virtual world that mirrors real-world search dynamics, we enable a continuously improving training recipe that empowers a tiny search agent to outperform large-scale open-source and commercial models (e.g., Tongyi DeepResearch and Claude-4.5 Sonnet). Specifically, on common benchmarks such as GAIA and Xbench, our LiteResearcher-4B achieves open-source state-of-the-art results of 71.3% and 78.0% respectively, demonstrating that scalable RL training is a key enabler for Deep Research Agents.