PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

arXiv cs.CL / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

PORTool is an importance-aware policy-optimization method for multi-tool integrated reasoning agents that helps address credit-assignment ambiguity from outcome-only rewards.
The approach builds a rewarded rollout tree where trajectories share prefixes, allowing step-level comparisons of different tool-use decisions within the same context.
PORTool estimates the importance of each step using a correctness-dominant signal (whether descendants can reach a correct final answer) plus an auxiliary term for tool-call formatting and successful execution.
Experiments on tool-use reasoning show higher final-answer accuracy and fewer tool-call steps than existing policy-optimization baselines, with ablations supporting the robustness of the step-wise importance estimates.

Abstract

Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents from outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate tool-use decisions drive success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that reinforces agents' tool-use competence from outcome-level supervision while assigning reward at the step level. Specifically, PORTool generates a rewarded rollout tree in which trajectories share prefixes before branching, enabling direct comparisons among alternative tool-use decisions within the same context. It then estimates each step's importance by a correctness-dominant signal, i.e., whether descendants of that step can ultimately produce a correct final answer, plus an auxiliary term indicating whether the step's tool calls satisfy formatting constraints and execute successfully. Using these step-wise importance estimates, PORTool updates the policy to generate efficient tool-call steps, guided by both local comparisons within each branching decision and the overall quality of entire trajectories. Experiments show that PORTool improves final-answer accuracy while reducing tool-call steps compared with state-of-the-art policy-optimization baselines, and ablation studies confirm the robustness of the proposed step-wise importance estimates.

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

The Verge

CLMA Frame Test

Dev.to

You Are Right — You Don't Need CLAUDE.md

Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Dev.to

PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

Key Points

Abstract

Related Articles

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

CLMA Frame Test

You Are Right — You Don't Need CLAUDE.md

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer