TInR: Exploring Tool-Internalized Reasoning in Large Language Models

arXiv cs.CL / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Tool-Internalized Reasoning (TInR) to improve tool-integrated reasoning by internalizing tool knowledge into LLMs rather than relying on external tool documentation during inference.
  • It identifies key challenges for TInR, including (1) internalizing tool knowledge and (2) coordinating internal reasoning with actual tool usage.
  • The authors propose TInR-U, a unified framework trained with a three-phase pipeline: bidirectional knowledge alignment, supervised fine-tuning with high-quality reasoning annotations, and reinforcement learning using TInR-specific rewards.
  • Experiments across in-domain and out-of-domain tasks indicate TInR-U delivers better performance while also improving efficiency, suggesting the approach can mitigate existing tool size and inference inefficiency issues.
  • The work frames TInR as an architectural/training direction for making LLMs more effective at using tools without incurring the overhead and constraints of external documentation reliance.

Abstract

Tool-Integrated Reasoning (TIR) has emerged as a promising direction by extending Large Language Models' (LLMs) capabilities with external tools during reasoning. Existing TIR methods typically rely on external tool documentation during reasoning. However, this leads to tool mastery difficulty, tool size constraints, and inference inefficiency. To mitigate these issues, we explore Tool-Internalized Reasoning (TInR), aiming at facilitating reasoning with tool knowledge internalized into LLMs. Achieving this goal presents notable requirements, including tool internalization and tool-reasoning coordination. To address them, we propose TInR-U, a tool-internalized reasoning framework for unified reasoning and tool usage. TInR-U is trained through a three-phase pipeline: 1) tool internalization with a bidirectional knowledge alignment strategy; 2) supervised fine-tuning warm-up using high-quality reasoning annotations, and 3) reinforcement learning with TInR-specific rewards. We comprehensively evaluate our method across in-domain and out-of-domain settings. Experiment results show that TInR-U achieves superior performance in both settings, highlighting its effectiveness and efficiency.