AI Navigate

Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models

arXiv cs.AI / 3/12/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Adaptive Activation Cancellation (AAC) is an inference-time framework that treats hallucination-related activations as structured interference in the transformer residual stream and suppresses them without external knowledge, fine-tuning, or extra inference passes.
  • H-Nodes are identified via layer-wise linear probing, and a confidence-weighted forward hook is applied during autoregressive generation to surgically suppress these nodes in real time.
  • Evaluations on OPT-125M, Phi-3-mini, and LLaMA 3-8B show that the real-time hook is the sole intervention that consistently improves factual accuracy on TruthfulQA and HaluEval across scales, with no degradation in standard language modeling metrics like WikiText-103 perplexity and MMLU.
  • On LLaMA 3-8B, AAC also yields modest generation-level gains and demonstrates higher probe-space selectivity than baselines, illustrating that targeted neuron-level suppression can improve factuality while preserving overall model capability.

Abstract

Large Language Models frequently generate fluent but factually incorrect text. We propose Adaptive Activation Cancellation (AAC), a real-time inference-time framework that treats hallucination-associated neural activations as structured interference within the transformer residual stream, drawing an explicit analogy to classical adaptive noise cancellation from signal processing. The framework identifies Hallucination Nodes (H-Nodes) via layer-wise linear probing and suppresses them using a confidence-weighted forward hook during auto-regressive generation -- requiring no external knowledge, no fine-tuning, and no additional inference passes. Evaluated across OPT-125M, Phi-3-mini, and LLaMA 3-8B on TruthfulQA and HaluEval, the real-time hook is the only intervention that consistently improves downstream accuracy on all three scales. Critically, the method is strictly surgical: WikiText-103 perplexity and MMLU reasoning accuracy are preserved at exactly 0.0% degradation across all three model scales, a property that distinguishes AAC from interventions that trade fluency or general capability for factual improvement. On the LLaMA 3-8B scale, the hook additionally yields positive generation-level gains (MC1 +0.04; MC2 +0.003; Token-F1 +0.003) while achieving probe-space selectivity 5.94x - 3.5x higher than the ITI baseline -- demonstrating that targeted neuron-level suppression can simultaneously improve factual accuracy and preserve model capability.