Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms
arXiv cs.CL / 4/24/2026
📰 NewsModels & Research
Key Points
- The paper investigates how large language models (LLMs) store and retrieve factual knowledge, focusing on whether memorization depends on the specific surface form used for an entity.
- It introduces RedirectQA, an entity-based QA dataset built from Wikipedia redirect information that links Wikidata factual triples to multiple categorized surface forms (aliases, abbreviations, spelling variants, and common incorrect forms).
- Experiments across 13 LLMs show prediction accuracy can change significantly when only the entity surface form is modified, indicating that memorization/access is not fully invariant to naming.
- The effect is category-dependent: models handle minor orthographic spelling variations more robustly than larger lexical changes like aliases and abbreviations.
- Frequency analyses suggest that both entity-level and surface-level frequencies correlate with accuracy, with entity frequency sometimes contributing beyond surface frequency, implying a nuanced memorization mechanism.
Related Articles

MCP Auth That Actually Works: OAuth for Remote Servers
Dev.to

GoDavaii's Day 5: When 22 Indian Languages Redefine 'Hard' in Health AI
Dev.to

Gemma 4 and Qwen 3.6 with q8_0 and q4_0 KV cache: KL divergence results
Reddit r/LocalLLaMA
Corea arresta a hombre por imagen IA falsa del lobo Neukgu: hasta 5 años
Dev.to
Research taste is a skill nobody talks about. How do you develop it without collaborators? [D]
Reddit r/MachineLearning