Probing for Reading Times
arXiv cs.CL / 4/22/2026
📰 NewsModels & Research
Key Points
- The paper investigates whether language model internal representations contain cognitive signals that correlate with human reading times using eye-tracking data across five languages.
- Regularized linear regression probes each model layer against several scalar predictors, including surprisal, information value, and logit-lens surprisal.
- Results show early-layer representations predict early eye-tracking measures (e.g., first fixation and gaze duration) better than surprisal, suggesting low-level lexical/structural information aligns with early human processing.
- For later reading-time measures (e.g., total reading time), surprisal remains the strongest predictor despite being more compressed, indicating different mechanisms across reading stages.
- The best predictor varies by language and eye-tracking metric, and combining surprisal with early-layer representations improves performance.
Related Articles

GeoReg LLM-Driven Few-Shot Socio-Economic Estimation for Data-Scarce Regions
Dev.to

Rethinking CNN Models for Audio Classification
Dev.to

Anthropic’s most dangerous AI model just fell into the wrong hands
The Verge
v0.20.0rc1
vLLM Releases
I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw
Dev.to