Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry

arXiv cs.CL / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • 本論文は、古典漢詩におけるLLM生成詩の検出の難しさ(韻律の厳密さ、共通の詩的イメージ体系、柔軟な構文)を指摘し、既存研究のギャップを示している。
  • 古典漢詩に特化した検出ベンチマーク「ChangAn」を導入し、全30,664詩(人手10,276、4種のLLM生成20,388)で構成される評価用データセットを提供する。
  • ChangAnを用いて12のAI検出器を体系的に評価し、テキストの粒度や生成戦略の違いによる性能のばらつきを調べた。
  • 結果として、現行の中国語テキスト検出器は古典漢詩のLLM生成検出において信頼できるツールになっていないことが明らかになり、ChangAnの有効性と必要性を裏付けた。

Abstract

The rapid development of large language models (LLMs) has extended text generation tasks into the literary domain. However, AI-generated literary creations has raised increasingly prominent issues of creative authenticity and ethics in literary world, making the detection of LLM-generated literary texts essential and urgent. While previous works have made significant progress in detecting AI-generated text, it has yet to address classical Chinese poetry. Due to the unique linguistic features of classical Chinese poetry, such as strict metrical regularity, a shared system of poetic imagery, and flexible syntax, distinguishing whether a poem is authored by AI presents a substantial challenge. To address these issues, we introduce ChangAn, a benchmark for detecting LLM-generated classical Chinese poetry that containing total 30,664 poems, 10,276 are human-written poems and 20,388 poems are generated by four popular LLMs. Based on ChangAn, we conducted a systematic evaluation of 12 AI detectors, investigating their performance variations across different text granularities and generation strategies. Our findings highlight the limitations of current Chinese text detectors, which fail to serve as reliable tools for detecting LLM-generated classical Chinese poetry. These results validate the effectiveness and necessity of our proposed ChangAn benchmark. Our dataset and code are available at https://github.com/VelikayaScarlet/ChangAn.