Tool Retrieval Bridge: Aligning Vague Instructions with Retriever Preferences via Bridge Model

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper addresses a real-world mismatch in tool retrieval for LLMs, where benchmarks use overly specific tool instructions while actual user requests are often vague.
  • It introduces a new benchmark, VGToolBench, designed to simulate human-like vague instructions and shows that such vagueness significantly degrades tool retrieval performance.
  • The proposed Tool Retrieval Bridge (TRB) uses a bridge model to rewrite vague instructions into more specific ones that better match retriever expectations, reducing instruction ambiguity.
  • Experiments across multiple retrieval settings show TRB delivers consistent, substantial gains for several baseline retrievers, including a BM25 relative improvement of up to 111.51% (NDCG from 9.73 to 19.59).
  • The authors provide publicly available code and models to support replication and further tool-retrieval research.

Abstract

Tool learning has emerged as a promising paradigm for large language models (LLMs) to address real-world challenges. Due to the extensive and irregularly updated number of tools, tool retrieval for selecting the desired tool subset is essential. However, current tool retrieval methods are usually based on academic benchmarks containing overly detailed instructions (e.g., specific API names and parameters), while real-world instructions are more vague. Such a discrepancy would hinder the tool retrieval in real-world applications. In this paper, we first construct a new benchmark, VGToolBench, to simulate human vague instructions. Based on this, we conduct a series of preliminary analyses and find that vague instructions indeed damage the performance of tool retrieval. To this end, we propose a simple-yet-effective Tool Retrieval Bridge (TRB) approach to boost the performance of tool retrieval for vague instructions. The principle of TRB is to introduce a bridge model to rewrite the vague instructions into more specific ones and alleviate the gap between vague instructions and retriever preferences.We conduct extensive experiments under multiple commonly used retrieval settings, and the results show that TRB effectively mitigates the ambiguity of vague instructions while delivering consistent and substantial improvements across all baseline retrievers. For example, with the help of TRB, BM25 achieves a relative improvement of up to 111.51%, i.e., increasing the average NDCG score from 9.73 to 19.59. The source code and models are publicly available at https://github.com/kfchenhn/TRB.