HTAA: Enhancing LLM Planning via Hybrid Toolset Agentization & Adaptation

arXiv cs.CL / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces HTAA, a hierarchical framework to improve how LLMs plan and reliably execute hundreds of tools for real-world applications.
  • HTAA reduces inefficiency and error accumulation from flat tool-calling by “agentizing” frequently co-used tools into specialized agent tools, shrinking the planner’s action space.
  • It uses Asymmetric Planner Adaptation with trajectory-based training, aligning the high-level planner to agent tools through backward reconstruction and forward refinement.
  • Experiments on the InfoVerify dataset (POI validation workflow for a large ride-hailing platform) and multiple benchmarks show higher task success, shorter tool-calling trajectories, and lower context overhead than strong baselines.
  • The authors report production deployment benefits, including reduced manual validation effort and operational cost, supporting practical effectiveness.

Abstract

Enabling large language models to scale and reliably use hundreds of tools is critical for real-world applications, yet challenging due to the inefficiency and error accumulation inherent in flat tool-calling architectures. To address this, we propose Hybrid Toolset Agentization & Adaptation (HTAA), a hierarchical framework for scalable tool-use planning. We propose a novel toolset agentization paradigm, which encapsulates frequently co-used tools into specialized agent tools, thereby reducing the planner's action space and mitigating redundancy. To ensure effective coordination, we design Asymmetric Planner Adaptation, a trajectory-based training paradigm that aligns the high-level planner with agent tools via backward reconstruction and forward refinement. To validate the performance of HTAA, we conduct experiments on a real-world internal dataset, InfoVerify, based on the POI validation workflow of China's largest online large-scale ride-hailing platform, featuring long-horizon executable tool trajectories. Experiments on InfoVerify and widely-used benchmarks show that HTAA consistently achieves higher task success rates, requires short tool calling trajectories, and significantly reduces context overhead compared to strong baselines. Furthermore, in a production deployment, HTAA substantially reduces manual validation effort and operational cost, demonstrating its practical efficacy.