Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI

arXiv cs.CL / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study analyzes how large language models (LLMs) are changing the content and signals of peer review reports using fine-grained linguistic and evaluation-level measurements.
  • It finds that after LLMs emerged, peer review comments tend to be longer and more fluent, with increased focus on summary and surface-level clarity and more standardized language patterns, especially for reviewers with lower confidence.
  • The research uses maximum likelihood estimation to detect review reports that may have been modified or generated by LLMs, and then evaluates how these LLM-assisted signals affect paper decision-making.
  • Overall, the work reports a tradeoff: while communicative quality and certain recommendation-related cues become more prominent, attention to deeper evaluative aspects like originality, replicability, and nuanced critical reasoning declines.
  • The findings suggest LLM influence may shift peer review away from deeper technical assessment toward more polished, higher-level rhetoric, potentially affecting informativeness in editorial decisions.

Abstract

With the rapid advancement of Large Language Models (LLMs), the academic community has faced unprecedented disruptions, particularly in the realm of academic communication. The primary function of peer review is improving the quality of academic manuscripts, such as clarity, originality and other evaluation aspects. Although prior studies suggest that LLMs are beginning to influence peer review, it remains unclear whether they are altering its core evaluative functions. Moreover, the extent to which LLMs affect the linguistic form, evaluative focus, and recommendation-related signals of peer-review reports has yet to be systematically examined. In this study, we examine the changes in peer review reports for academic articles following the emergence of LLMs, emphasizing variations at fine-grained level. Specifically, we investigate linguistic features such as the length and complexity of words and sentences in review comments, while also automatically annotating the evaluation aspects of individual review sentences. We also use a maximum likelihood estimation method, previously established, to identify review reports that potentially have modified or generated by LLMs. Finally, we assess the impact of evaluation aspects mentioned in LLM-assisted review reports on the informativeness of recommendation for paper decision-making. The results indicate that following the emergence of LLMs, peer review texts have become longer and more fluent, with increased emphasis on summaries and surface-level clarity, as well as more standardized linguistic patterns, particularly reviewers with lower confidence score. At the same time, attention to deeper evaluative dimensions, such as originality, replicability, and nuanced critical reasoning, has declined.