The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

arXiv cs.LG / 4/28/2026

📰 NewsModels & Research

Key Points

  • Hypernetwork-based instant adaptation methods (e.g., Doc-to-LoRA) can internalize a document in one forward pass but degrade sharply on cases where the document conflicts with the model’s pretraining knowledge, dropping to 46.4% accuracy on the deepest facts.
  • The paper argues the failure is not due to representational limits: the hypernetwork targets the correct layers, yet its adapter “margin” stays roughly constant while the pretrained margin increases with training frequency, causing strong-prior conflicts to lose by construction.
  • Failure is predicted to correlate with how strongly the base model already favors the contradicted fact: on 194 conflicts ranked by base log-probability, accuracy falls from 68% (weak prior) to 16% (strong prior), a 52-point gap.
  • Proposed training-free fixes focus on “amplitude”: Selective Layer Boosting scales the adapter at its highest-norm layers, and Conflict-Aware Internalization applies boosting only when the base model is confident.
  • With these methods, deep-conflict accuracy improves from 46.4% to 71.0% on Gemma-2B and from 53.6% to 72.5% on Mistral-7B, while preserving novel-knowledge recall; the authors also release KID-Bench (489 questions) to evaluate prior-graded conflicts separately.

Abstract

Hypernetwork-based methods such as Doc-to-LoRA internalize a document into an LLM's weights in a single forward pass, but they fail systematically on conflicts: when the document contradicts pretraining knowledge, accuracy collapses to 46.4% on the deepest facts. We show the failure is a magnitude problem rather than a representational one. The hypernetwork already targets the right layers, but its adapter margin is approximately constant across documents while the pretrained margin grows with training frequency, so deep conflicts lose by construction. The account predicts that failure should track prior strength: sorting 194 conflicts by the base model's log-probability on the contradicted fact, baseline accuracy falls from 68% on weak-prior questions to 16% on strong-prior ones, a 52 percentage-point gap. The cure is amplitude. Selective Layer Boosting scales the adapter at its top-norm layers, and Conflict-Aware Internalization triggers boosting only when the base model is confident. Both are training-free; together they raise deep-conflict accuracy from 46.4% to 71.0% on Gemma-2B and from 53.6% to 72.5% on Mistral-7B while preserving novel-knowledge recall, and beat vanilla retrieval-augmented generation on medium conflicts by 18 percentage points despite operating entirely in parameter space. We release KID-Bench, a 489-question benchmark that separates novel recall, cross-knowledge combination, and prior-graded conflicts.