Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

arXiv cs.CL / 4/24/2026

📰 NewsModels & Research

共有:

Key Points

The paper investigates how large language models (LLMs) store and retrieve factual knowledge, focusing on whether memorization depends on the specific surface form used for an entity.
It introduces RedirectQA, an entity-based QA dataset built from Wikipedia redirect information that links Wikidata factual triples to multiple categorized surface forms (aliases, abbreviations, spelling variants, and common incorrect forms).
Experiments across 13 LLMs show prediction accuracy can change significantly when only the entity surface form is modified, indicating that memorization/access is not fully invariant to naming.
The effect is category-dependent: models handle minor orthographic spelling variations more robustly than larger lexical changes like aliases and abbreviations.
Frequency analyses suggest that both entity-level and surface-level frequencies correlate with accuracy, with entity frequency sometimes contributing beyond surface frequency, implying a nuanced memorization mechanism.

Abstract

Understanding what kinds of factual knowledge large language models (LLMs) memorize is essential for evaluating their reliability and limitations. Entity-based QA is a common framework for analyzing non-verbatim memorization, but typical evaluations query each entity using a single canonical surface form, making it difficult to disentangle fact memorization from access through a particular name. We introduce RedirectQA, an entity-based QA dataset that uses Wikipedia redirect information to associate Wikidata factual triples with categorized surface forms for each entity, including alternative names, abbreviations, spelling variants, and common erroneous forms. Across 13 LLMs, we examine surface-conditioned factual memorization and find that prediction outcomes often change when only the entity surface form changes. This inconsistency is category-dependent: models are more robust to minor orthographic variations than to larger lexical variations such as aliases and abbreviations. Frequency analyses further suggest that both entity- and surface-level frequencies are associated with accuracy, and that entity frequency often contributes beyond surface frequency. Overall, factual memorization appears neither purely surface-specific nor fully surface-invariant, highlighting the importance of surface-form diversity in evaluating non-verbatim memorization.

MCP Auth That Actually Works: OAuth for Remote Servers

Dev.to

GoDavaii's Day 5: When 22 Indian Languages Redefine 'Hard' in Health AI

Dev.to

Gemma 4 and Qwen 3.6 with q8_0 and q4_0 KV cache: KL divergence results

Reddit r/LocalLLaMA

Corea arresta a hombre por imagen IA falsa del lobo Neukgu: hasta 5 años

Dev.to

Research taste is a skill nobody talks about. How do you develop it without collaborators? [D]

Reddit r/MachineLearning

Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

Key Points

Abstract

Related Articles

MCP Auth That Actually Works: OAuth for Remote Servers

GoDavaii's Day 5: When 22 Indian Languages Redefine 'Hard' in Health AI

Gemma 4 and Qwen 3.6 with q8_0 and q4_0 KV cache: KL divergence results

Corea arresta a hombre por imagen IA falsa del lobo Neukgu: hasta 5 años

Research taste is a skill nobody talks about. How do you develop it without collaborators? [D]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer