ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget

arXiv cs.CL / 4/3/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces ORBIT, a synthetic training dataset of 20K reasoning-intensive search-agent queries with short, verifiable answers designed to reduce reliance on costly human annotation and paid APIs.
ORBIT is generated via a modular four-stage pipeline (seed creation, QA pair generation, and two verification stages including self-verification and external web-based verification).
The dataset covers 15 domains and each training example includes 4–5 reasoning steps, with external verification requiring full-web search to confirm correctness.
Experiments show that training Qwen3-4B on ORBIT using GRPO yields strong performance for sub-4B LLMs as search agents, with evaluations on Wikipedia question-answering tasks.
The authors release the framework code and datasets publicly, emphasizing reproducibility and practical adoption for building search-agent training data on limited budgets.

Abstract

Search agents, which integrate language models (LMs) with web search, are becoming crucial for answering complex user queries. Constructing training datasets for deep research tasks, involving multi-step retrieval and reasoning, remains challenging due to expensive human annotation, or cumbersome prerequisites. In this work, we introduce ORBIT, a training dataset with 20K reasoning-intensive queries with short verifiable answers, generated using a frugal framework without relying on paid API services. The modular framework relies on four stages: seed creation, question-answer pair generation, and two stages of verification: self and external. ORBIT spans 15 domains and each training pair requires 4-5 reasoning steps, with external search verification required from the complete web. We train Qwen3-4B as the base model on ORBIT using GRPO and evaluate it on Wikipedia question answering tasks. Extensive experiment results demonstrate that ORBIT-4B achieves strong performance among sub-4B LLMs as search agents, proving the utility of synthetic datasets. Our framework, code and datasets are open-sourced and available publicly.