Zero-shot Large Language Models for Automatic Readability Assessment

arXiv cs.CL / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a new zero-shot prompting methodology for unsupervised automatic readability assessment (ARA) using large language models (LLMs).
  • It reports the first comprehensive evaluation of 10 diverse open-source LLMs across 14 varied datasets, covering differences in text length and language.
  • Results show the proposed prompting approach improves performance over prior methods on 13 out of 14 datasets.
  • The authors also propose LAURAE, a hybrid approach that combines LLM outputs with traditional readability formula scores to better capture both contextual and surface-level features.
  • LAURAE demonstrates robust gains over prior methods across multiple languages, varying text lengths, and different levels of technical vocabulary.

Abstract

Unsupervised automatic readability assessment (ARA) methods have important practical and research applications (e.g., ensuring medical or educational materials are suitable for their target audiences). In this paper, we propose a new zero-shot prompting methodology for ARA and present the first comprehensive evaluation of using large language models (LLMs) as an unsupervised ARA method by testing 10 diverse open-source LLMs (e.g., different sizes and developers) on 14 diverse datasets (e.g., different text lengths and languages). Our findings show that our proposed prompting methodology outperforms prior methods on 13 of the 14 datasets. Furthermore, we propose LAURAE, which combines LLM and readability formula scores to improve robustness by capturing both contextual and shallow (e.g., sentence length) features of readability. Our evaluation demonstrates that LAURAE robustly outperforms prior methods across languages, text lengths, and amounts of technical language.