Thinking with Reasoning Skills: Fewer Tokens, More Accuracy

arXiv cs.AI / 4/25/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The article proposes a new method for reasoning LLMs that reduces wasted tokens spent on long intermediate traces by reusing distilled reasoning “skills.”
  • Instead of reasoning from scratch for every query, the model first retrieves relevant stored skills for the current problem to avoid redundant detours.
  • The approach creates a reusable set of reasoning skills by summarizing and storing insights distilled from prior extensive deliberation and trial-and-error exploration.
  • Experiments on coding and mathematical reasoning tasks show fewer reasoning tokens and improved overall performance compared with the prevailing paradigm.
  • The authors argue that the reduced per-request token cost makes the method economically attractive for practical real-world deployment.

Abstract

Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems. We propose to summarize and store reusable reasoning skills distilled from extensive deliberation and trial-and-error exploration, and to retrieve these skills at inference time to guide future reasoning. Unlike the prevailing \emph{reasoning from scratch} paradigm, our approach first recalls relevant skills for each query, helping the model avoid redundant detours and focus on effective solution paths. We evaluate our method on coding and mathematical reasoning tasks, and find that it significantly reduces reasoning tokens while improving overall performance. The resulting lower per-request cost indicates strong practical and economic potential for real-world deployment.