Researchers from Berkeley, UCSD, UW, and Google DeepMind studied three datasets: a 100-person controlled user study, 86 pre-LLM human-written essays (collected in 2021) revised by GPT-5-mini, Gemini 2.5 Flash, and Claude Haiku, and 18,000 ICLR 2026 peer reviews.
The clearest finding: when mapped in embedding space, LLM-revised essays cluster tightly in a region not occupied by any of the human-written essays. Human essays are spread broadly. The LLM pushes every essay in the same direction regardless of revision instruction - even "fix grammar only" produces this shift. The unique lexical fingerprint of each writer is overwritten by the LLM's preferred vocabulary.
The stance shift is measurable: users given LLM assistance wrote significantly more neutral essays and avoided definitive positions. LLMs increased nouns and adjectives, decreased pronouns - more formal and statistical, less personal. Personal experience arguments were replaced with statistical and expert-citation arguments.
The ICLR 2026 finding is the sharpest institutional data point: 21% of peer reviews were AI-generated. AI reviewers scored papers 10% higher, were 136% more likely to emphasize reproducibility, and 84% more likely to emphasize scalability. Humans were more likely to comment on clarity as both a strength and a weakness. The criteria being rewarded in peer review are already shifting.
The user study paradox: heavy LLM users recognized the loss of voice but reported equivalent satisfaction. The efficiency gain is immediate; the cultural cost is diffuse.
Is there a writing task where you've noticed LLM revision consistently pulling you away from what you actually wanted to say?
[link] [comments]

