Cognitive Loop of Thought: Reversible Hierarchical Markov Chain for Efficient Mathematical Reasoning

arXiv cs.CL / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Cognitive Loop of Thought (CLoT), a reversible hierarchical Markov chain–based framework designed to make long chain-of-thought reasoning more computationally efficient for LLMs.
  • CLoT addresses prior Markov/long-CoT approaches’ weaknesses by combining hierarchical sub-problem decomposition, backward verification at each layer, and pruning of redundant lower-level steps after higher-level verification.
  • A new instruction-style backward reasoning dataset, CLoT-Instruct, is proposed to support the framework’s backward reasoning and verification mechanism.
  • Experiments on four mathematical benchmarks show improved robustness and reduced error propagation, with a reported 99.0% accuracy on AddSub using GPT-4o-mini, outperforming baseline CoT variants by 4.1% and 2.9%.
  • Overall, the work aims to maintain reasoning quality while reducing long sequence length and KV-cache inefficiencies that hinder widespread use of long chain-of-thought.

Abstract

Multi-step Chain-of-Thought (CoT) has significantly advanced the mathematical reasoning capabilities of LLMs by leveraging explicit reasoning steps. However, the widespread adoption of Long CoT often results in sequence lengths that exceed manageable computational limits. While existing approaches attempt to alleviate this by reducing KV Cache redundancy via Markov chain-like structures, they introduce two critical limitations: inherent memorylessness (loss of context) and limited backward reasoning capability. To address these limitations, we propose a novel Chain-of-Thought framework based on Reversible Hierarchical Markov Chain, termed Cognitive Loop of Thought (CLoT), and a backward reasoning dataset CLoT-Instruct. In CLoT, problems are decomposed into sub-problems with hierarchical dependencies. Inspired by human cognitive processes, we introduce a backward verification mechanism at each hierarchical layer. Furthermore, we implement a pruning strategy: once higher-level sub-problems are verified, redundant lower-level sub-problems are pruned to maximize efficiency. This approach effectively mitigates error propagation and enhances reasoning robustness. Experiments on four mathematical benchmarks demonstrate the effectiveness of our method. Notably, on the AddSub dataset using GPT-4o-mini, CLoT achieves 99.0% accuracy, outperforming traditional CoT and CoT-SC by 4.1% and 2.9%, respectively.