ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

arXiv cs.CL / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces ToolGrad, an agentic framework for generating tool-use datasets that avoids failure-prone, annotation-heavy pipelines used in prior “query-first” approaches.
Instead of creating a user query and then adding complex tool-use annotations, ToolGrad uses iterative steps guided by textual “gradients” to build valid tool-use chains first (“answer-first”), and then synthesizes the matching user queries.
ToolGrad produced ToolGrad-500, showing more complex tool usage, lower generation cost, and nearly a 100% pass rate for generated samples.
Experiments indicate models trained on ToolGrad’s datasets outperform models trained on costly baseline datasets and even some proprietary LLM-based datasets.
The authors provide the source code, dataset, and models publicly via GitHub for replication and further research.

Abstract

Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like depth-first search (DFS). This leads to inevitable annotation failures and low efficiency in data generation. We introduce ToolGrad, an agentic framework that inverts this paradigm. ToolGrad first constructs valid tool-use chains through an iterative process guided by textual "gradients", and then synthesizes corresponding user queries. This "answer-first" approach led to ToolGrad-500, a dataset generated with more complex tool use, lower cost, and almost 100% pass rate. Experiments show that ToolGrad models outperform those trained on expensive baseline datasets and proprietary LLMs. The ToolGrad source code, dataset, and models are available at https://github.com/zhongyi-zhou/toolgrad.

ALM on Power Platform: ADO + GitHub, the best of both worlds

Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️

Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Dev.to

Open source models are going to be the future on Cursor, OpenCode etc.

Reddit r/LocalLLaMA

How I Automated VPN Deployment with AI: The World's First AI-Powered VPN Kit

Dev.to

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

Key Points

Abstract

Related Articles

ALM on Power Platform: ADO + GitHub, the best of both worlds

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Open source models are going to be the future on Cursor, OpenCode etc.

How I Automated VPN Deployment with AI: The World's First AI-Powered VPN Kit

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer