Can LLMs Perceive Time? An Empirical Investigation

arXiv cs.AI / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper reports that LLMs generally cannot reliably estimate how long their own tasks will take, finding large estimation errors across 68 tasks and four model families.
In pre-task duration estimates, models systematically overshoot real durations by 4–7×, often predicting human-scale minutes for tasks that finish in seconds.
Experiments on relative ordering show near-chance or worse performance on counter-intuitive task pairs, indicating heuristic or label-driven behavior rather than genuine time understanding.
Post-hoc recall of duration is also poorly calibrated, with estimates diverging from actuals by roughly an order of magnitude in either direction, and similar failures persisting in multi-step agentic settings with 5–10× errors.
The authors conclude that while LLMs may contain propositional duration knowledge from training, they lack experiential grounding in their own inference time, with direct implications for agent scheduling, planning, and time-critical use cases.

Abstract

Large language models cannot estimate how long their own tasks take. We investigate this limitation through four experiments across 68 tasks and four model families. Pre-task estimates overshoot actual duration by 4--7

\times

(

p < 0.001

), with models predicting human-scale minutes for tasks completing in seconds. Relative ordering fares no better: on task pairs designed to expose heuristic reliance, models score at or below chance (GPT-5: 18\% on counter-intuitive pairs,

p = 0.033

), systematically failing when complexity labels mislead. Post-hoc recall is disconnected from reality -- estimates diverge from actuals by an order of magnitude in either direction. These failures persist in multi-step agentic settings, with errors of 5--10

\times

. The models possess propositional knowledge about duration from training but lack experiential grounding in their own inference time, with practical implications for agent scheduling, planning and time-critical scenarios.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/2DailyView insight →

Benchmarking Batch Deep Reinforcement Learning Algorithms

Dev.to

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse

Dev.to

How To Leverage AI for Back-Office Headcount Optimization

Dev.to

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

Reddit r/LocalLLaMA

SOTA Language Models Under 14B?

Reddit r/LocalLLaMA

Can LLMs Perceive Time? An Empirical Investigation

Key Points

Abstract

💡 Insights using this article

Related Articles

Benchmarking Batch Deep Reinforcement Learning Algorithms

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse

How To Leverage AI for Back-Office Headcount Optimization

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

SOTA Language Models Under 14B?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer