The Effect of Idea Elaboration on the Automatic Assessment of Idea Originality

arXiv cs.AI / 4/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study examines whether large language models (LLMs) mirror human judgments when automatically assessing how original ideas are in a divergent thinking task (Alternate Uses Task).
  • It compares human-rated originality from trained student raters against machine ratings from two fine-tuned specialized systems and a ChatGPT-4o setup using the same prompt instructions.
  • Results show a self-preference bias: LLM-based automatic assessors tend to favor outputs that resemble their own style rather than human creativity.
  • Crucially, the self-preference bias disappears when analyses control for the degree of idea elaboration, suggesting elaboration can explain the bias.
  • The paper outlines theoretical and methodological implications for future research on automated creativity assessment systems.

Abstract

Automatic systems are increasingly used to assess the originality of responses in creative tasks. They offer a potential solution to key limitations of human assessment (cost, fatigue, and subjectivity), but there is preliminary evidence of a self-preference bias. Accordingly, automatic systems tend to prefer outcomes that are more closely related to their style, rather than to the human one. In this paper, we investigated how Large Language Models (LLMs) align with human raters in assessing the originality of responses in a divergent thinking task. We analysed 4,813 responses to the Alternate Uses Task produced by higher and lower creative humans and ChatGPT-4o. Human raters were two university students who underwent intensive training. Machine raters were two specialised systems fine-tuned on AUT responses and corresponding human ratings (OCSAI and CLAUS) and ChatGPT-4o, which was prompted with the same instructions as human raters. Results confirmed the presence of a self-preference bias in LLMs. Automatic systems tended to privilege artificial responses. However, this self-preference bias disappeared when the analyses controlled for the idea elaboration. We discuss theoretical and methodological implications of these findings by highlighting future directions for research on creativity assessment.