AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
arXiv cs.AI / 3/17/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper proposes AutoTool, a training paradigm that combines warm-up supervised fine-tuning with reinforcement learning to automatically determine appropriate reasoning trajectories for tool use.
- It shows that entropy-based optimization objectives help maintain model diversity while enabling scalable long-range reasoning via an entropy-based long-short reasoning fusion RL strategy.
- The approach addresses two RL scaling challenges: under-scaling of thinking length and inefficiency from overthinking simple problems.
- Experimental results on three benchmarks show 9.8% accuracy improvements and about 81% reduction in computational overhead, demonstrating effective auto-scaling of tool use.
- This work advances scalable tool-use capabilities in RL, potentially improving efficiency and performance in AI agents.




