Distilling Reasoning Without Knowledge: A Framework for Reliable LLMs

arXiv cs.CL / 3/17/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper highlights unreliability of LLMs for fact-seeking tasks when information is up-to-date or conflicting, even with retrieval and tool-usage features.
It introduces a modular framework that explicitly separates planning from factual retrieval and answer synthesis to reduce hallucinations and improve efficiency.
A lightweight student planner is trained via a teacher-student setup to generate structured decompositions consisting of abstract reasoning steps and searchable fact requests, without revealing factual answers during training.
During inference, the planner outputs plans while retrieval and response synthesis are performed by prompt-engineered modules.
On the SEAL-0 benchmark, supervised planning improves both accuracy and latency compared to monolithic reasoning models and prompt-based tool-augmented frameworks, showing that explicit planning structures enhance reliability of fact-seeking LLMs.

Abstract

Fact-seeking question answering with large language models (LLMs) remains unreliable when answers depend on up-to-date or conflicting information. Although retrieval-augmented and tool-using LLMs reduce hallucinations, they often rely on implicit planning, leading to inefficient tool usage. We propose a modular framework that explicitly separates planning from factual retrieval and answer synthesis. A lightweight student planner is trained via a teacher-student framework to generate structured decompositions consisting of abstract reasoning steps and searchable fact requests. The supervision signals contain only planning traces and fact requests, without providing factual answers or retrieved evidence. At inference, the planner produces plans, while prompt-engineered modules perform retrieval and response synthesis. We evaluate the proposed framework on SEAL-0, an extremely challenging benchmark for search-augmented LLMs. Results show that supervised planning improves both accuracy and latency compared to monolithic reasoning models and prompt-based tool-augmented frameworks, demonstrating that explicitly learned planning structures are essential for reliable fact-seeking LLMs.