JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

arXiv cs.AI / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLM agents with many external, domain-specific tools often fail due to generic “one-size-fits-all” prompts and tool schemas that are underspecified for when to use each tool and how to format arguments.
  • It proposes JTPRO (Joint Tool-Prompt Reflective Optimization), which uses rollout-driven reflection in trace-supervised settings to jointly optimize both global agent instructions and per-tool schema/argument descriptions.
  • The framework aims to keep only tool-local cues needed for correct disambiguation and slot/value filling, improving reliability even in large tool inventories.
  • Experiments on multi-tool benchmarks measure Tool Selection Accuracy (TSA), Slot Filling Accuracy (SFA), and Overall Success Rate (OSR), with JTPRO outperforming strong baselines and reflective optimizers like GEPA by 5%–20% (relative) on OSR.
  • Ablation results indicate that jointly optimizing instructions and tool schemas is more effective and robust than optimizing either component alone.

Abstract

Large language model (LLM) agents augmented with external tools often struggle as number of tools grow large and become domain-specific. In such settings, ambiguous tool descriptions and under-specified agent instructions frequently lead to tool mis-selection and incorrect slot/value instantiation. We hypothesize that this is due to two root causes: generic, one-size-fits-all prompts that ignore tool-specific nuances, and underspecified tool schemas that lack clear guidance on when and how to use each tool and how to format its parameters. We introduce Joint Tool-Prompt Reflective Optimization (JTPRO), a framework for improving tool-calling reliability in trace-supervised settings by iteratively using rollout-driven reflection to co-optimize global instructions and per-tool schema/argument descriptions for accurate tool selection and argument instantiation in large tool inventories. JTPRO is designed to preserve only tool-local cues needed for correct disambiguation and slot filling. We evaluate JTPRO across multi-tool benchmarks, which account for different number of tools using three metrics: Tool Selection Accuracy (TSA), Slot Filling Accuracy(SFA), and Overall Success Rate(OSR) (correct tool + correct slots + correct values). JTPRO consistently outperforms strong baselines, including CoT-style agents, and reflective prompt optimizers such as GEPA by 5%-20% (relative) on OSR. Ablations show that joint optimization of instructions and tool schemas is more effective and robust than optimizing either component in isolation.