PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

Apple Machine Learning Journal / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Multi-tool-integrated reasoning lets LLM agents alternate natural-language reasoning with external tool calls, but training with only outcome-level rewards creates credit-assignment ambiguity.
  • The PORTool paper introduces an importance-aware policy optimization approach that uses step-level reward assignment to clarify which intermediate actions and tool decisions contribute to success or failure.
  • PORTool generates a “rewarded tree” to structure and distribute learning signals across reasoning/tool-use steps rather than treating the whole episode as a single reward.
  • The method aims to reinforce agents’ tool-use competence under outcome-level supervision, improving learning effectiveness for complex tool-using tasks.
Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents using outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate steps (or tool-use decisions) lead to success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that reinforces agents’ tool-use competence from outcome-level supervision while assigning reward at the step level. Specifically, PORTool generates a rewarded…

Continue reading this article on the original site.

Read original →