Search, Do not Guess: Teaching Small Language Models to Be Effective Search Agents

arXiv cs.AI / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Search-enabled agents are promising for knowledge-intensive tasks, but using full-scale LLMs as search agents is often too computationally expensive for practical deployment.
Experiments on complex multi-hop reasoning show that distilled small language models (SLMs) tend to call search tools less often and hallucinate more, even though they have reasoning ability.
The paper proposes policy, a lightweight fine-tuning method that explicitly teaches SLMs to retrieve information reliably and generate answers grounded in the retrieved evidence.
Compared with LLM-to-SLM agent distillation, policy reportedly improves benchmark performance by 17.3 on Bamboogle and 15.3 on HotpotQA, reaching LLM-level results across evaluated benchmarks.
The authors also find that adaptive search strategies in SLMs can harm performance, implying that consistent search behavior is important for dependable reasoning.

Abstract

Agents equipped with search tools have emerged as effective solutions for knowledge-intensive tasks. While Large Language Models (LLMs) exhibit strong reasoning capabilities, their high computational cost limits practical deployment for search agents. Consequently, recent work has focused on distilling agentic behaviors from LLMs into Small Language Models (SLMs). Through comprehensive evaluation on complex multi-hop reasoning tasks, we find that despite possessing less parametric knowledge, SLMs invoke search tools less frequently and are more prone to hallucinations. To address this issue, we propose \policy, a lightweight fine-tuning approach that explicitly trains SLMs to reliably retrieve and generate answers grounded in retrieved evidence. Compared to agent distillation from LLMs, our approach improves performance by 17.3 scores on Bamboogle and 15.3 scores on HotpotQA, achieving LLM-level results across benchmarks. Our further analysis reveals that adaptive search strategies in SLMs often degrade performance, highlighting the necessity of consistent search behavior for reliable reasoning.

Black Hat Asia

AI Business

Research with ChatGPT

Dev.to

Silicon Valley is quietly running on Chinese open source models and almost nobody is talking about it

Reddit r/LocalLLaMA

Why AI Product Quality Is Now an Evaluation Pipeline Problem, Not a Model Problem

Dev.to

The 10 Best AI Tools for SEO and Digital Marketing in 2026

Dev.to

Search, Do not Guess: Teaching Small Language Models to Be Effective Search Agents

Key Points

Abstract

Related Articles

Black Hat Asia

Research with ChatGPT

Silicon Valley is quietly running on Chinese open source models and almost nobody is talking about it

Why AI Product Quality Is Now an Evaluation Pipeline Problem, Not a Model Problem

The 10 Best AI Tools for SEO and Digital Marketing in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer