NeuReasoner: Towards Explainable, Controllable, and Unified Reasoning via Mixture-of-Neurons

arXiv cs.CL / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that large reasoning models suffer recurring failure modes at three levels—within-step errors, inter-step oscillation/stagnation, and instance-level maladaptive overthinking—yet existing work treats these issues in isolation.
It presents a white-box analysis that uses a Mixture-of-Neurons (MoN) perspective to identify key neurons and their fluctuation patterns tied to specific failure types.
Based on these findings, the authors propose NeuReasoner, a unified framework aimed at making reasoning explainable and controllable via MoN-driven mechanisms.
NeuReasoner combines lightweight MLPs for failure detection with a special token-triggered self-correction learned through supervised fine-tuning (SFT), inserting tokens during inference to activate remedial behaviors.
Experiments across six benchmarks and six backbone model sizes (8B–70B) show up to 27.0% performance gains and 19.6%–63.3% reductions in token consumption versus nine baselines.

Abstract

Large Reasoning Models (LRMs) have recently achieved remarkable success in complex reasoning tasks. However, closer scrutiny reveals persistent failure modes compromising performance and cost: I) Intra-step level, marked by calculation or derivation errors; II) Inter-step level, involving oscillation and stagnation; and III) Instance level, causing maladaptive over-thinking. Existing endeavors target isolated levels without unification, while their black-box nature and reliance on RL hinder explainability and controllability. To bridge these gaps, we conduct an in-depth white-box analysis, identifying key neurons (Mixture of Neurons, MoN) and their fluctuation patterns associated with distinct failures. Building upon these insights, we propose NeuReasoner, an explainable, controllable, and unified reasoning framework driven by MoN. Technically, NeuReasoner integrates lightweight MLPs for failure detection with a special token-triggered self-correction mechanism learned via SFT. During inference, special tokens are inserted upon failure detection to actuate controllable remedial behaviors. Extensive evaluations across six benchmarks, six backbone models (8B~70B) against nine competitive baselines, demonstrate that NeuReasoner achieves performance gains of up to 27.0% while reducing token consumption by 19.6% ~ 63.3%.