Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models
arXiv cs.AI / 3/12/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- Adaptive Activation Cancellation (AAC) is an inference-time framework that treats hallucination-related activations as structured interference in the transformer residual stream and suppresses them without external knowledge, fine-tuning, or extra inference passes.
- H-Nodes are identified via layer-wise linear probing, and a confidence-weighted forward hook is applied during autoregressive generation to surgically suppress these nodes in real time.
- Evaluations on OPT-125M, Phi-3-mini, and LLaMA 3-8B show that the real-time hook is the sole intervention that consistently improves factual accuracy on TruthfulQA and HaluEval across scales, with no degradation in standard language modeling metrics like WikiText-103 perplexity and MMLU.
- On LLaMA 3-8B, AAC also yields modest generation-level gains and demonstrates higher probe-space selectivity than baselines, illustrating that targeted neuron-level suppression can improve factuality while preserving overall model capability.




