Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations

arXiv cs.LG / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Sparse Mixture-of-Experts (MoE) models can hallucinate more often on long-tail knowledge, and the paper attributes this to static Top-k routing that prioritizes high-frequency patterns over rare but crucial associations.
  • The authors propose Counterfactual Routing (CoR), a training-free inference method that uses layer-wise perturbation analysis and a Counterfactual Expert Impact (CEI) metric to reallocate expert computation dynamically.
  • CoR “awakens” dormant specialist experts by virtually ablating syntax-dominant pathways, shifting resources toward knowledge-intensive layers while keeping the total activation count constant.
  • Experiments on TruthfulQA, FACTOR, and TriviaQA show CoR improves factual accuracy by an average of 3.1% without increasing the inference budget, yielding a better accuracy–compute tradeoff than static scaling.
  • The work suggests a practical mitigation strategy for MoE hallucinations that can be added at inference time rather than requiring retraining or model redesign.

Abstract

Sparse Mixture-of-Experts (MoE) models have achieved remarkable scalability, yet they remain vulnerable to hallucinations, particularly when processing long-tail knowledge. We identify that this fragility stems from static Top-k routing: routers tend to favor high-frequency patterns over rare factual associations. Consequently, ``specialist experts'' possessing critical long-tail knowledge are often assigned low gating scores and remain ``dormant'' -- under-prioritized for specific tokens despite their proven causal importance on other inputs. To address this, we propose Counterfactual Routing (CoR), a training-free inference framework designed to awaken these dormant experts. CoR integrates layer-wise perturbation analysis with the Counterfactual Expert Impact (CEI) metric to dynamically shift computational resources from syntax-dominant to knowledge-intensive layers while maintaining a constant total activation count, effectively retrieving causally decisive experts via virtual ablation. Extensive experiments on TruthfulQA, FACTOR, and TriviaQA demonstrate that CoR improves factual accuracy by 3.1\% on average without increasing the inference budget, establishing a superior Pareto frontier compared to static scaling strategies.