Where Reasoning Breaks: Logic-Aware Path Selection by Controlling Logical Connectives in LLMs Reasoning Chains

arXiv cs.CL / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLM reasoning is fragile in multi-step logical deduction because small transition errors can cascade through the entire reasoning chain.
  • Empirical evidence suggests that logical connective tokens are high-entropy “forking points,” where models often struggle to choose the correct logical direction.
  • The authors hypothesize that explicitly intervening in logical connective selection can steer LLMs toward more correct reasoning paths.
  • They propose a multi-layer framework combining gradient-based logical steering, localized branching with targeted look-ahead search, and token-level transition preference optimization using reinforcement learning at logic-critical pivots.
  • The framework targets only logic-critical transitions to improve the accuracy–efficiency trade-off relative to global approaches such as beam search and self-consistency.

Abstract

While LLMs demonstrate impressive reasoning capabilities, they remain fragile in multi-step logical deduction, where a single transition error can propagate through the entire reasoning chain, leading to unstable performance. In this work, we identify logical connectives as primary points of this structural fragility. Through empirical analysis, we show that connective tokens function as high entropy forking points, at which models frequently struggle to determine the correct logical direction. Motivated by this observation, we hypothesize that intervening in logical connective selection can guide LLMs toward more correct logical direction, thereby improving the overall reasoning chain. To validate this hypothesis, we propose a multi-layered framework that intervenes specifically at these logic-critical junctions in the reasoning process. Our framework includes (1) Gradient-based Logical Steering to guide LLMs internal representations towards valid reasoning subspaces, (2) Localized Branching to resolve ambiguity via targeted look-ahead search, and (3) Targeted Transition Preference Optimization, a surgical reinforcement learning objective that selectively optimizes single-token preferences at logical pivots. Crucially, by concentrating intervention solely on logic-critical transitions, our framework achieves a favorable accuracy--efficiency trade-off compared to global inference time scaling methods like beam search and self-consistency.