AI Navigate

How LLMs Distort Our Written Language

arXiv cs.CL / 3/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper demonstrates that LLMs can alter not only voice and tone but also the intended semantic meaning of human writing.
  • A user study shows heavy LLM use increases the share of essays that remain neutral on the topic by about 70%, indicating reduced stance expression.
  • The authors show that asking an LLM to revise based on human feedback can substantially change content meaning, even when edits are limited to grammar changes.
  • Analysis of AI-generated peer reviews at a major AI conference finds reviews with higher scores but lower emphasis on clarity and significance, suggesting misalignment with research evaluation.
  • The findings argue for future work on how widespread AI-assisted writing will affect culture and scientific institutions due to semantics distortion.

Abstract

Large language models (LLMs) are used by over a billion people globally, most often to assist with writing. In this work, we demonstrate that LLMs not only alter the voice and tone of human writing, but also consistently alter the intended meaning. First, we conduct a human user study to understand how people actually interact with LLMs when using them for writing. Our findings reveal that extensive LLM use led to a nearly 70% increase in essays that remained neutral in answering the topic question. Significantly more heavy LLM users reported that the writing was less creative and not in their voice. Next, using a dataset of human-written essays that was collected in 2021 before the widespread release of LLMs, we study how asking an LLM to revise the essay based on the human-written feedback in the dataset induces large changes in the resulting content and meaning. We find that even when LLMs are prompted with expert feedback and asked to only make grammar edits, they still change the text in a way that significantly alters its semantic meaning. We then examine LLM-generated text in the wild, specifically focusing on the 21% of AI-generated scientific peer reviews at a recent top AI conference. We find that LLM-generated reviews place significantly less weight on clarity and significance of the research, and assign scores that, on average, are a full point higher.These findings highlight a misalignment between the perceived benefit of AI use and an implicit, consistent effect on the semantics of human writing, motivating future work on how widespread AI writing will affect our cultural and scientific institutions.