Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics?

arXiv cs.CL / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The arXiv study evaluates whether leading LLMs (GPT-4o, Gemini 1.5 Pro, and Claude Sonnet 3.5) can mimic the authorial styles of figures from literature and politics using zero-shot prompting with strict thematic alignment.
  • It finds that AI-generated text remains “highly detectable,” with machine-learning classifiers (BERT plus XGBoost) achieving strong accuracy using only a small set of eight stylometric features.
  • Perplexity emerges as the most discriminative metric, suggesting that differences in the stochastic regularity of AI outputs versus human writing drive detectability.
  • While LLMs show partial convergence on low-dimensional heuristics (e.g., syntactic complexity and readability), they fail to fully reproduce nuanced affective density and stylistic variance.
  • The work provides a benchmark for assessing LLM stylistic behavior and informs authorship attribution efforts in digital humanities and social media.

Abstract

Amidst the rising capabilities of generative AI to mimic specific human styles, this study investigates the ability of state-of-the-art large language models (LLMs), including GPT-4o, Gemini 1.5 Pro, and Claude Sonnet 3.5, to emulate the authorial signatures of prominent literary and political figures: Walt Whitman, William Wordsworth, Donald Trump, and Barack Obama. Utilizing a zero-shot prompting framework with strict thematic alignment, we generated synthetic corpora evaluated through a complementary framework combining transformer-based classification (BERT) and interpretable machine learning (XGBoost). Our methodology integrates Linguistic Inquiry and Word Count (LIWC) markers, perplexity, and readability indices to assess the divergence between AI-generated and human-authored text. Results demonstrate that AI-generated mimicry remains highly detectable, with XGBoost models trained on a restricted set of eight stylometric features achieving accuracy comparable to high-dimensional neural classifiers. Feature importance analyses identify perplexity as the primary discriminative metric, revealing a significant divergence in the stochastic regularity of AI outputs compared to the higher variability of human writing. While LLMs exhibit distributional convergence with human authors on low-dimensional heuristic features, such as syntactic complexity and readability, they do not yet fully replicate the nuanced affective density and stylistic variance inherent in the human-authored corpus. By isolating the specific statistical gaps in current generative mimicry, this study provides a comprehensive benchmark for LLM stylistic behavior and offers critical insights for authorship attribution in the digital humanities and social media.

Decoding AI Authorship: Can LLMs Truly Mimic Human Style Across Literature and Politics? | AI Navigate