AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints

arXiv cs.AI / 3/17/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper proposes AutoTool, a training paradigm that combines warm-up supervised fine-tuning with reinforcement learning to automatically determine appropriate reasoning trajectories for tool use.
It shows that entropy-based optimization objectives help maintain model diversity while enabling scalable long-range reasoning via an entropy-based long-short reasoning fusion RL strategy.
The approach addresses two RL scaling challenges: under-scaling of thinking length and inefficiency from overthinking simple problems.
Experimental results on three benchmarks show 9.8% accuracy improvements and about 81% reduction in computational overhead, demonstrating effective auto-scaling of tool use.
This work advances scalable tool-use capabilities in RL, potentially improving efficiency and performance in AI agents.

Abstract

Tool use represents a critical capability for AI agents, with recent advances focusing on leveraging reinforcement learning (RL) to scale up the explicit reasoning process to achieve better performance. However, there are some key challenges for tool use in current RL-based scaling approaches: (a) direct RL training often struggles to scale up thinking length sufficiently to solve complex problems, and (b) scaled-up models tend to overthink simpler problems, resulting in substantial token inefficiency. To address these challenges, we propose a novel training paradigm that first employs warm-up supervised fine-tuning to help models distinguish between simple and complex problems, followed by RL that enable models to automatically determine appropriate reasoning trajectories. Furthermore, to tackle the issue of automatic thinking-length scaling, we discover that entropy-based optimization objectives effectively maintain model diversity while successfully unlocking the model's scaling capabilities. Based on this insight, we introduce an entropy-based long-short reasoning fusion RL strategy. Our experiments on three benchmarks demonstrate that model successfully achieves auto-scaling for efficient tool use, achieving significant 9.8\% accuracy improvements while reducing computational overhead by \textasciitilde81\%.

What 81,000 people want from AI

Anthropic News

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

日経XTECH

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

日経XTECH

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

日経XTECH

AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints

Key Points

Abstract

Related Articles

What 81,000 people want from AI

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

What 81,000 people want from AI

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

中国AI企業が他社製AIを「ただ乗り蒸留」か 米社が主張、安全保障リスクも

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも