Beyond Meta-Reasoning: Metacognitive Consolidation for Self-Improving LLM Reasoning

arXiv cs.AI / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current meta-reasoning methods are largely episodic and fail to accumulate reusable meta-cognitive skills across different problem instances.
  • It proposes “Metacognitive Consolidation,” a framework that consolidates a model’s metacognitive experiences from past reasoning into reusable knowledge for improved future meta-reasoning.
  • The approach structures each instance’s problem solving into separate roles (reasoning, monitoring, and control) to produce rich, attributed meta-level traces.
  • Those traces are integrated via a hierarchical, multi-timescale update mechanism that gradually builds evolving meta-knowledge.
  • Experiments report consistent gains across multiple benchmarks and model backbones, with performance improving as the metacognitive experience accumulates over time.

Abstract

Large language models (LLMs) have demonstrated strong reasoning capabilities, and as existing approaches for enhancing LLM reasoning continue to mature, increasing attention has shifted toward meta-reasoning as a promising direction for further improvement. However, most existing meta-reasoning methods remain episodic: they focus on executing complex meta-reasoning routines within individual instances, but ignore the accumulation of reusable meta-reasoning skills across instances, leading to recurring failure modes and repeatedly high metacognitive effort. In this paper, we introduce Metacognitive Consolidation, a novel framework in which a model consolidates metacognitive experience from past reasoning episodes into reusable knowledge that improves future meta-reasoning. We instantiate this framework by structuring instance-level problem solving into distinct roles for reasoning, monitoring, and control to generate rich, attributable meta-level traces. These traces are then consolidated through a hierarchical, multi-timescale update mechanism that gradually forms evolving meta-knowledge. Experimental results demonstrate consistent performance gains across benchmarks and backbone models, and show that performance improves as metacognitive experience accumulates over time.