Tracing Relational Knowledge Recall in Large Language Models

arXiv cs.CL / 4/23/2026

📰 NewsModels & Research

共有:

Key Points

The paper investigates how large language models retrieve relational knowledge during text generation, aiming to find internal representations that can support relation classification via linear probes.
It compares multiple latent representations derived from attention heads and MLP components, concluding that per-head attention contributions to the residual stream are especially strong for linear relation classification.
The study analyzes trained probe feature attributions and shows that probe accuracy correlates with relation specificity, entity connectedness, and how broadly the relevant signal is distributed across attention heads.
It demonstrates that token-level feature attribution of probe predictions can further expose how probes (and the model) behave at a finer granularity.
Overall, the work clarifies which internal signals are most linearly usable for relation extraction and why different relation types differ in linear separability.

Abstract

We study how large language models recall relational knowledge during text generation, with a focus on identifying latent representations suitable for relation classification via linear probes. Prior work shows how attention heads and MLPs interact to resolve subject, predicate, and object, but it remains unclear which representations support faithful linear relation classification and why some relation types are easier to capture linearly than others. We systematically evaluate different latent representations derived from attention head and MLP contributions, showing that per-head attention contributions to the residual stream are comparatively strong features for linear relation classification. Feature attribution analyses of the trained probes, as well as characteristics of the different relation types, reveal clear correlations between probe accuracy and relation specificity, entity connectedness, and how distributed the signal on which the probe relies is across attention heads. Finally, we show how token-level feature attribution of probe predictions can be used to reveal probe behavior in further detail.