Can We Still Hear the Accent? Investigating the Resilience of Native Language Signals in the LLM Era

arXiv cs.AI / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper examines whether the shift from machine translation to LLM-based writing assistance is homogenizing academic writing by tracking native language identification (NLI) signals in ACL Anthology across three time periods.
Using a semi-automated labeling approach and a fine-tuned classifier to detect “linguistic fingerprints” of author backgrounds, the authors find an overall decline in NLI performance over time, suggesting weakening native-language cues.
The post-LLM period shows non-uniform behavior: Chinese and French display anomalous resilience or divergent NLI trends compared with the broader decline.
In contrast, Japanese and Korean show sharper-than-expected deterioration in NLI detectability, indicating language-specific effects in the LLM era.
The findings imply that LLMs (and related writing workflows) may reduce observable native-language variation differently across languages, affecting research about writing authenticity and author inference.

Abstract

The evolution of writing assistance tools from machine translation to large language models (LLMs) has changed how researchers write. This study investigates whether this shift is homogenizing research papers by analyzing native language identification (NLI) trends in ACL Anthology papers across three eras: pre-neural network (NN), pre-LLM, and post-LLM. We construct a labeled dataset using a semi-automated framework and fine-tune a classifier to detect linguistic fingerprints of author backgrounds. Our analysis shows a consistent decline in NLI performance over time. Interestingly, the post-LLM era reveals anomalies: while Chinese and French show unexpected resistance or divergent trends, Japanese and Korean exhibit sharper-than-expected declines.