Expect the Unexpected? Testing the Surprisal of Salient Entities

arXiv cs.CL / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates how discourse entity salience affects surprisal, addressing a gap left by prior UID (Uniform Information Density) research that largely ignored participant salience.
  • Using 70K manually annotated mentions across 16 English genres and a novel minimal-pair prompting method, the study finds globally salient entities have significantly higher surprisal than non-salient entities even after controlling for confounds like position and length.
  • The authors also report that when salient entities are used as prompts, they systematically reduce surprisal for surrounding content, increasing overall document-level predictability.
  • The magnitude of this prompt-driven predictability effect varies by genre, being strongest in topic-coherent texts and weakest in conversational contexts.
  • Overall, the work refines the UID competing pressures framework by proposing global entity salience as a mechanism that shapes information distribution across discourse.

Abstract

Previous work examining the Uniform Information Density (UID) hypothesis has shown that while information as measured by surprisal metrics is distributed more or less evenly across documents overall, local discrepancies can arise due to functional pressures corresponding to syntactic and discourse structural constraints. However, work thus far has largely disregarded the relative salience of discourse participants. We fill this gap by studying how overall salience of entities in discourse relates to surprisal using 70K manually annotated mentions across 16 genres of English and a novel minimal-pair prompting method. Our results show that globally salient entities exhibit significantly higher surprisal than non-salient ones, even controlling for position, length, and nesting confounds. Moreover, salient entities systematically reduce surprisal for surrounding content when used as prompts, enhancing document-level predictability. This effect varies by genre, appearing strongest in topic-coherent texts and weakest in conversational contexts. Our findings refine the UID competing pressures framework by identifying global entity salience as a mechanism shaping information distribution in discourse.