Revisiting Real-Time Digging-In Effects: No Evidence from NP/Z Garden-Paths

arXiv cs.CL / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses whether “digging-in” effects in NP/Z garden-path sentences are genuine real-time evidence for self-organized sentence processing or instead arise from wrap-up processes and methodological confounds.
  • Two experiments (Maze task and self-paced reading) compare human results against predictions from an ensemble of large language models.
  • The study finds no clear evidence of real-time digging-in effects in human sentence processing.
  • Sentence-final vs. nonfinal disambiguation items produce qualitatively different patterns, with seemingly positive digging-in trends occurring only sentence-finally due to wrap-up confounds.
  • Nonfinal items show reverse trends that align with neural language model predictions, supporting the surprisal-theory expectation of no digging-in under non-shifting statistical expectations.

Abstract

Digging-in effects, where disambiguation difficulty increases with longer ambiguous regions, have been cited as evidence for self-organized sentence processing, in which structural commitments strengthen over time. In contrast, surprisal theory predicts no such effect unless lengthening genuinely shifts statistical expectations, and neural language models appear to show the opposite pattern. Whether digging-in is a robust real-time phenomenon in human sentence processing -- or an artifact of wrap-up processes or methodological confounds -- remains unclear. We report two experiments on English NP/Z garden-path sentences using Maze and self-paced reading, comparing human behavior with predictions from an ensemble of large language models. We find no evidence for real-time digging-in effects. Critically, items with sentence-final versus nonfinal disambiguation show qualitatively different patterns: positive digging-in trends appear only sentence-finally, where wrap-up effects confound interpretation. Nonfinal items -- the cleaner test of real-time processing -- show reverse trends consistent with neural model predictions.