Can LLMs Perceive Time? An Empirical Investigation

arXiv cs.AI / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper reports that LLMs generally cannot reliably estimate how long their own tasks will take, finding large estimation errors across 68 tasks and four model families.
  • In pre-task duration estimates, models systematically overshoot real durations by 4–7×, often predicting human-scale minutes for tasks that finish in seconds.
  • Experiments on relative ordering show near-chance or worse performance on counter-intuitive task pairs, indicating heuristic or label-driven behavior rather than genuine time understanding.
  • Post-hoc recall of duration is also poorly calibrated, with estimates diverging from actuals by roughly an order of magnitude in either direction, and similar failures persisting in multi-step agentic settings with 5–10× errors.
  • The authors conclude that while LLMs may contain propositional duration knowledge from training, they lack experiential grounding in their own inference time, with direct implications for agent scheduling, planning, and time-critical use cases.

Abstract

Large language models cannot estimate how long their own tasks take. We investigate this limitation through four experiments across 68 tasks and four model families. Pre-task estimates overshoot actual duration by 4--7\times (p < 0.001), with models predicting human-scale minutes for tasks completing in seconds. Relative ordering fares no better: on task pairs designed to expose heuristic reliance, models score at or below chance (GPT-5: 18\% on counter-intuitive pairs, p = 0.033), systematically failing when complexity labels mislead. Post-hoc recall is disconnected from reality -- estimates diverge from actuals by an order of magnitude in either direction. These failures persist in multi-step agentic settings, with errors of 5--10\times. The models possess propositional knowledge about duration from training but lack experiential grounding in their own inference time, with practical implications for agent scheduling, planning and time-critical scenarios.