GRADE: Probing Knowledge Gaps in LLMs through Gradient Subspace Dynamics

arXiv cs.CL / 4/6/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

本論文は、LLMが質問に正しく答えるのに必要な内部知識が足りているかを判定する課題に対し、隠れ状態の活性だけでは不十分になり得る点（文体や長さなどの非有用特徴が活性化される等）を指摘しています。
GRADE（Gradient Dynamics for knowledge gap detection）は、隠れ状態サブスペースに対する勾配のクロスレイヤ順位比を用いて知識ギャップを定量化し、勾配が必要な知識更新の推定量として働くという動機づけを提示しています。
6つのベンチマークでの検証により、GRADEが有効であり入力擾乱に対して頑健であることを示しています。
長文回答に対するケーススタディでは、勾配の連鎖が知識ギャップを解釈可能な説明として生成できる可能性が示されています。

Abstract

Detecting whether a model's internal knowledge is sufficient to correctly answer a given question is a fundamental challenge in deploying responsible LLMs. In addition to verbalising the confidence by LLM self-report, more recent methods explore the model internals, such as the hidden states of the response tokens to capture how much knowledge is activated. We argue that such activated knowledge may not align with what the query requires, e.g., capturing the stylistic and length-related features that are uninformative for answering the query. To fill the gap, we propose GRADE (Gradient Dynamics for knowledge gap detection), which quantifies the knowledge gap via the cross-layer rank ratio of the gradient to that of the corresponding hidden state subspace. This is motivated by the property of gradients as estimators of the required knowledge updates for a given target. We validate \modelname{} on six benchmarks, demonstrating its effectiveness and robustness to input perturbations. In addition, we present a case study showing how the gradient chain can generate interpretable explanations of knowledge gaps for long-form answers.