Narrative Fingerprints: Multi-Scale Author Identification via Novelty Curve Dynamics

arXiv cs.CL / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates whether individual authors can be identified from measurable patterns in information-theoretic “novelty curves” across their texts.
  • Using Books3 and PG-19, it reports multi-scale author signals: book-level scalar novelty dynamics identify 43% of authors above chance, while chapter-level SAX motif patterns in sliding windows yield much stronger attribution.
  • The study finds the book-level and chapter-level signals are complementary rather than redundant, implying different levels of text structure carry distinct authorial information.
  • It shows the attribution signal is partly confounded by genre but remains detectable within genres for roughly one-quarter of authors.
  • It argues the effect is not merely a modern-format artifact by noting comparable fingerprint strength for authors like Twain, Austen, and Kipling.

Abstract

We test whether authors have characteristic "fingerprints" in the information-theoretic novelty curves of their published works. Working with two corpora -- Books3 (52,796 books, 759 qualifying authors) and PG-19 (28,439 books, 1,821 qualifying authors) -- we find that authorial voice leaves measurable traces in how novelty unfolds across a text. The signal is multi-scale: at book level, scalar dynamics (mean novelty, speed, volume, circuitousness) identify 43% of authors significantly above chance; at chapter level, SAX motif patterns in sliding windows achieve 30x-above-chance attribution, far exceeding the scalar features that dominate at book level. These signals are complementary, not redundant. We show that the fingerprint is partly confounded with genre but persists within-genre for approximately one-quarter of authors. Classical authors (Twain, Austen, Kipling) show fingerprints comparable in strength to modern authors, suggesting the phenomenon is not an artifact of contemporary publishing conventions.