When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning
arXiv cs.CL / 4/10/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper analyzes tool-integrated reasoning (TIR) for large reasoning models and finds a tendency to either over-trust internal reasoning when it conflicts with tool outputs or to ignore correct tool results (“Tool Ignored”).
- It argues that current tool-integrated models lack a reliable mechanism to decide when to trust or disregard tool execution outcomes.
- To address this, the authors propose Adaptive Tool Trust Calibration (ATTC), which adaptively decides whether to trust tool results using the confidence score of generated code blocks.
- Experiments across multiple open-source TIR models, dataset types, and model sizes show ATTC reduces the “Tool Ignored” failure mode and improves overall performance by 4.1% to 7.5%.



