The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

arXiv cs.LG / 4/28/2026

📰 NewsModels & Research

共有:

Key Points

Hypernetwork-based instant adaptation methods (e.g., Doc-to-LoRA) can internalize a document in one forward pass but degrade sharply on cases where the document conflicts with the model’s pretraining knowledge, dropping to 46.4% accuracy on the deepest facts.
The paper argues the failure is not due to representational limits: the hypernetwork targets the correct layers, yet its adapter “margin” stays roughly constant while the pretrained margin increases with training frequency, causing strong-prior conflicts to lose by construction.
Failure is predicted to correlate with how strongly the base model already favors the contradicted fact: on 194 conflicts ranked by base log-probability, accuracy falls from 68% (weak prior) to 16% (strong prior), a 52-point gap.
Proposed training-free fixes focus on “amplitude”: Selective Layer Boosting scales the adapter at its highest-norm layers, and Conflict-Aware Internalization applies boosting only when the base model is confident.
With these methods, deep-conflict accuracy improves from 46.4% to 71.0% on Gemma-2B and from 53.6% to 72.5% on Mistral-7B, while preserving novel-knowledge recall; the authors also release KID-Bench (489 questions) to evaluate prior-graded conflicts separately.

Abstract

Hypernetwork-based methods such as Doc-to-LoRA internalize a document into an LLM's weights in a single forward pass, but they fail systematically on conflicts: when the document contradicts pretraining knowledge, accuracy collapses to 46.4% on the deepest facts. We show the failure is a magnitude problem rather than a representational one. The hypernetwork already targets the right layers, but its adapter margin is approximately constant across documents while the pretrained margin grows with training frequency, so deep conflicts lose by construction. The account predicts that failure should track prior strength: sorting 194 conflicts by the base model's log-probability on the contradicted fact, baseline accuracy falls from 68% on weak-prior questions to 16% on strong-prior ones, a 52 percentage-point gap. The cure is amplitude. Selective Layer Boosting scales the adapter at its top-norm layers, and Conflict-Aware Internalization triggers boosting only when the base model is confident. Both are training-free; together they raise deep-conflict accuracy from 46.4% to 71.0% on Gemma-2B and from 53.6% to 72.5% on Mistral-7B while preserving novel-knowledge recall, and beat vanilla retrieval-augmented generation on medium conflicts by 18 percentage points despite operating entirely in parameter space. We release KID-Bench, a 489-question benchmark that separates novel recall, cross-knowledge combination, and prior-graded conflicts.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/28DailyView insight →

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

Dev.to

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

Dev.to

Real-Time Monitoring for AI Agents: Beyond Log Streaming

Dev.to

Top 10 Physical AI Models Powering Real-World Robots in 2026

MarkTechPost

The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

Key Points

Abstract

💡 Insights using this article

Related Articles

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

Real-Time Monitoring for AI Agents: Beyond Log Streaming

Top 10 Physical AI Models Powering Real-World Robots in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer