Voice Under Revision: Large Language Models and the Normalization of Personal Narrative

arXiv cs.CL / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how LLM-based rewriting changes the style and “narrative texture” of personal narratives by analyzing 300 texts rewritten by three frontier models under different prompt setups.
Across models and prompt conditions, rewriting consistently drives “stylistic normalization,” with decreases in function words, contractions, and first-person pronouns alongside increases in vocabulary diversity, word length, and punctuation elaboration.
Even when prompts aim to preserve the original voice, the edits are smaller but still follow the same overall directional pattern, and stylometric features move the rewritten texts away from their sources.
The authors argue these effects can reshape downstream tasks in digital humanities and computational text analysis, since common style/voice signals (e.g., pronouns, contractions, punctuation) may be altered by LLM mediation rather than reflecting original authorship or corpus integrity.

Abstract

This study examines how large language model rewriting alters the style and narrative texture of personal narratives. It analyzes 300 personal narratives rewritten by three frontier LLMs under three prompt conditions: generic improvement, rewrite-only, and voice-preserving revision. Change is measured across 13 linguistic markers drawn from computational stylistics, including function words, vocabulary diversity, word length, punctuation, contractions, first-person pronouns, and emotion words. Across models and prompt conditions, LLM rewriting produces a consistent pattern of stylistic normalization. Function words, contractions, and first-person pronouns decrease, while vocabulary diversity, word length, and punctuation elaboration increase. These shifts occur whether the prompt asks the model to "improve" the text or simply to "rewrite" it. Voice-preserving prompts reduce the magnitude of the changes but do not eliminate their direction. Stylometric analysis shows that rewritten texts converge in feature space and become harder to match back to their source texts. Additional narrative markers indicate a shift from embedded to distanced narration, and from explicit causal reasoning to compressed abstraction. The findings suggest that contemporary LLMs exert a directional pull toward a more polished, less situated register. This has consequences for digital humanities and computational text analysis, where features such as function words, pronouns, contractions, and punctuation often serve as evidence for style, voice, authorship, and corpus integrity. LLM revision should therefore be understood not merely as surface-level editing, but as a consequential form of textual mediation.