Agentic Tool Use in Large Language Models

arXiv cs.CL / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLMs used as autonomous agents succeed in real settings only when paired with reliable tool use for retrieval, computation, and executing external actions.
  • It surveys fragmented prior work and proposes a unifying taxonomy of agentic tool use into three paradigms: plug-and-play prompting, supervised tool learning, and reward-driven tool policy learning.
  • The authors compare each paradigm’s methods, strengths, and typical failure modes, explaining how tool-use behaviors differ across training/usage settings.
  • It also reviews how tool-use evaluation is done today and identifies key open challenges that hinder consistent progress and measurement across studies.
  • The goal is to reduce literature fragmentation and provide a structured “evolutionary” view that can guide future research and development of agentic tool-use systems.

Abstract

Large language models are increasingly being deployed as autonomous agents yet their real world effectiveness depends on reliable tools for information retrieval, computation and external action. Existing studies remain fragmented across tasks, tool types, and training settings, lacking a unified view of how tool-use methods differ and evolve. This paper organizes the literature into three paradigms: prompting as plug-and-play, supervised tool learning and reward-driven tool policy learning, analyzes their methods, strengths and failure modes, reviews the evaluation landscape and highlights key challenges, aiming to address this fragmentation and provide a more structured evolutionary view of agentic tool use.