Temporal Flattening in LLM-Generated Text: Comparing Human and LLM Writing Trajectories

arXiv cs.CL / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates whether LLMs can reproduce the longitudinal “trajectory” of human writing across long time spans when deployed in stateless or history-conditioned interaction settings.
It introduces a released longitudinal dataset covering 412 human authors and 6,086 documents from 2012–2024 across academic abstracts, blogs, and news, and generates comparable trajectories using three representative LLMs.
Using drift and variance metrics over semantic, lexical, and cognitive-emotional representations, the study finds “temporal flattening” in LLM outputs: LLMs show less semantic and cognitive-emotional change over time than humans.
Although LLM-generated text has greater lexical diversity, the reduced semantic and emotional drift makes temporal-variability patterns highly predictive for distinguishing human vs. LLM trajectories (94% accuracy, 98% ROC-AUC).
The authors conclude that this temporal-flattening gap persists even when models use incremental history, with implications for synthetic training data quality and longitudinal text modeling.

Abstract

Large language models (LLMs) are increasingly used in daily applications, from content generation to code writing, where each interaction treats the model as stateless, generating responses independently without memory. Yet human writing is inherently longitudinal: authors' styles and cognitive states evolve across months and years. This raises a central question: can LLMs reproduce such temporal structure across extended time periods? We construct and publicly release a longitudinal dataset of 412 human authors and 6,086 documents spanning 2012--2024 across three domains (academic abstracts, blogs, news) and compare them to trajectories generated by three representative LLMs under standard and history-conditioned generation settings. Using drift and variance-based metrics over semantic, lexical, and cognitive-emotional representations, we find temporal flattening in LLM-generated text. LLMs produce greater lexical diversity but exhibit substantially reduced semantic and cognitive-emotional drift relative to humans. These differences are highly predictive: temporal variability patterns alone achieve 94% accuracy and 98% ROC-AUC in distinguishing human from LLM trajectories. Our results demonstrate that temporal flattening persists regardless of whether LLMs generate independently or with access to incremental history, revealing a fundamental property of current deployment paradigms. This gap has direct implications for applications requiring authentic temporal structure, such as synthetic training data and longitudinal text modeling.