Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis
arXiv cs.LG / 4/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper asks whether reinforcement learning (RL) truly expands the capability boundary of LLM agents or only improves reliability, extending prior “pass@k convergence” results from static reasoning to agentic tool use.
- It introduces a new metric, PASS@(k,T), that jointly evaluates the sampling budget (k) and the interaction depth (T) to disentangle capability gains from efficiency gains.
- The authors find that RL expands the capability boundary for tool-using agents: the RL pass curve rises above the base model’s, and the gap grows at larger k rather than converging.
- This capability expansion is most pronounced on compositional, sequential information-gathering tasks, while on simpler tasks RL behaves as earlier work would predict (i.e., less boundary expansion).
- With matched training data, supervised fine-tuning actually regresses on the same compositional tasks, and mechanism analysis suggests RL works by reweighting the base strategy distribution toward choices that more often lead to correct downstream reasoning, especially in integrating retrieved information.
Related Articles
langchain-anthropic==1.4.1
LangChain Releases

🚀 Anti-Gravity Meets Cloud AI: The Future of Effortless Development
Dev.to

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs
Dev.to

AI Will Run Companies. Here's Why That Should Excite You, Not Scare You.
Dev.to

The problem with Big Tech AI pricing (and why 8 countries can't afford to compete)
Dev.to