RAGEN-2: Reasoning Collapse in Agentic RL
arXiv cs.LG / 4/9/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- RAGEN-2 identifies a new failure mode in multi-turn LLM agent reinforcement learning—"template collapse"—where models produce seemingly diverse reasoning that is actually input-agnostic, evading detection by entropy-based metrics.
- The paper proposes decomposing reasoning quality into within-input diversity (measured by entropy) and cross-input distinguishability (measured via mutual information), showing that mutual information correlates more strongly with final task performance than entropy across multiple tasks.
- To enable online diagnosis, RAGEN-2 introduces mutual-information proxy metrics and demonstrates their effectiveness in detecting when reasoning stops responding to different inputs.
- The authors explain template collapse using a signal-to-noise ratio (SNR) mechanism: low reward variance weakens useful task gradients so regularization dominates and removes cross-input differences.
- As a mitigation, the paper introduces SNR-Aware Filtering that selects high-signal prompts per iteration using reward variance, improving both input dependence and performance in planning, math, web navigation, and code execution.
Related Articles

Black Hat Asia
AI Business
[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]
Reddit r/MachineLearning
Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds
Dev.to
Artificial Intelligence and Life in 2030: The One Hundred Year Study onArtificial Intelligence
Dev.to
Stop waiting for Java to rebuild! AI IDEs + Zero-Latency Hot Reload = Magic
Dev.to