A Decomposition Perspective to Long-context Reasoning for LLMs

arXiv cs.CL / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that long-context reasoning failures in LLMs come partly from researchers treating the task holistically rather than analyzing its internal structure.
  • It decomposes long-context reasoning into multiple atomic skills and generates targeted pseudo-datasets to isolate and train each skill.
  • The authors find that scores on these atomic skills are strongly correlated with overall long-text reasoning performance across benchmarks.
  • Using reinforcement learning on the pseudo-datasets, the method improves the model’s atomic skills and yields better general long-context reasoning results.
  • Experiments across several benchmarks show an average performance gain of 7.7% (from 46.3% to 54.0%), indicating the approach is effective and generalizable.

Abstract

Long-context reasoning is essential for complex real-world applications, yet remains a significant challenge for Large Language Models (LLMs). Despite the rapid evolution in long-context reasoning, current research often overlooks the internal complexity of the long-context reasoning task itself. In this paper, we move beyond this holistic view and decompose long-context reasoning into a set of fundamental atomic skills, and we then automatically synthesize a suite of pseudo datasets, each explicitly targeting a specific atomic skill. Our empirical analysis confirms that proficiency in these atomic skills is strongly correlated with general long-text reasoning performance. Building on this insight, we employ reinforcement learning on these pseudo datasets to sharpen the model's atomic skills, in the hope of boosting its general long-context reasoning ability. Extensive experiments across multiple benchmarks demonstrate the effectiveness of our approach: it outperforms a strong baseline by an average margin of 7.7\% (improving from 46.3\% to 54.0\%) across Loogle, Loong, LongBench-v2, BrowscompLong, Ruler-qa2, and MRCR.