SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation

arXiv cs.LG / 4/23/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper argues that LLM agents’ tool selection and ordering fail when ordering requires inter-tool data dependencies that are not captured by tool descriptions.
  • It introduces SkillGraph, a directed weighted execution-transition graph mined from 49,831 successful agent trajectories to capture reusable workflow-precedence regularities.
  • The proposed approach uses a two-stage decoupled pipeline: GS-Hybrid retrieval for generating tool candidates and a learned pairwise reranker to determine the correct order.
  • Experiments show improved performance on ToolBench and a major Kendall-τ lift on API-Bank, including moving from negative to positive ordering correlation.
  • With identical Stage-1 inputs, the learned reranker is reported to outperform LLaMA-3.1-8B-based rerankers in Stage 2.

Abstract

LLM agents must select tools from large API libraries and order them correctly. Existing methods use semantic similarity for both retrieval and ordering, but ordering depends on inter-tool data dependencies that are absent from tool descriptions. As a result, semantic-only methods can produce negative Kendall-\tau in structured workflow domains. We introduce SkillGraph, a directed weighted execution-transition graph mined from 49,831 successful LLM agent trajectories, which encodes workflow-precedence regularities as a reusable graph foundation prior. Building on this graph foundation prior, we propose a two-stage decoupled framework: GS-Hybrid retrieval for candidate selection and a learned pairwise reranker for ordering. On ToolBench (9,965 test instances; ~16,000 tools), the method reaches Set-F1 = 0.271 and Kendall-\tau = 0.096; on API-Bank, Kendall-\tau improves from -0.433 to +0.613. Under identical Stage-1 inputs, the learned reranker also outperforms LLaMA-3.1-8B Stage-2 rerankers.