SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Stepwise Adaptive Thinking (SAT) to reduce “overthinking” in Large Reasoning Models by pruning reasoning at the step level without breaking the underlying logic.
  • SAT models the reasoning process as a finite-state machine with modes (Slow/Normal/Fast/Skip) that adapt dynamically to step difficulty.
  • A lightweight Process Reward Model (PRM) guides state transitions, compressing easy steps while retaining depth on harder ones.
  • Experiments across 9 LRM models and 7 benchmarks report up to a 40% reduction in reasoning tokens with accuracy generally maintained or improved.
  • The approach aims to balance token efficiency and fine-grained control, addressing trade-offs seen in prior methods that optimized token use at the cost of reasoning integrity.

Abstract

Large Reasoning Models (LRMs) have revolutionized complex problem-solving, yet they exhibit a pervasive "overthinking", generating unnecessarily long reasoning chains. While current solutions improve token efficiency, they often sacrifice fine-grained control or risk disrupting the logical integrity of the reasoning process. To address this, we introduce Stepwise Adaptive Thinking (SAT), a framework that performs step-level, difficulty-aware pruning while preserving the core reasoning structure. SAT formulates reasoning as a Finite-State Machine (FSM) with distinct thinking modes (Slow, Normal, Fast, Skip). It navigates these states dynamically using a lightweight Process Reward Model (PRM), compressing easy steps while preserving depth for hard ones. Experiments across 9 LRMs and 7 benchmarks show that SAT achieves up to 40% reduction in reasoning tokens while generally maintaining or improving accuracy.