Identifying the Periodicity of Information in Natural Language

arXiv cs.CL / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper asks whether natural language contains periodic patterns in how information is encoded and measured via surprisal.
  • It introduces “AutoPeriod of Surprisal (APS),” a method that applies a canonical periodicity-detection algorithm to the surprisal sequence within a single document.
  • Experiments on multiple corpora suggest that a substantial portion of human language exhibits strong information periodicity.
  • The study also finds additional significant periods that do not align with typical text structural units (like sentence boundaries) and supports them using harmonic regression.
  • It concludes that observed periodicity arises from both structured linguistic factors and longer-range drivers, and discusses potential uses for detecting LLM-generated text.

Abstract

Recent theoretical advancement of information density in natural language has brought the following question on desk: To what degree does natural language exhibit periodicity pattern in its encoded information? We address this question by introducing a new method called AutoPeriod of Surprisal (APS). APS adopts a canonical periodicity detection algorithm and is able to identify any significant periods that exist in the surprisal sequence of a single document. By applying the algorithm to a set of corpora, we have obtained the following interesting results: Firstly, a considerable proportion of human language demonstrates a strong pattern of periodicity in information; Secondly, new periods that are outside the distributions of typical structural units in text (e.g., sentence boundaries, elementary discourse units, etc.) are found and further confirmed via harmonic regression modeling. We conclude that the periodicity of information in language is a joint outcome from both structured factors and other driving factors that take effect at longer distances. The advantages of our periodicity detection method and its potentials in LLM-generation detection are further discussed.

Identifying the Periodicity of Information in Natural Language | AI Navigate