LLMs Struggle with Abstract Meaning Comprehension More Than Expected

arXiv cs.AI / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that abstract word understanding remains difficult for language models because abstract meanings are non-concrete and rely on high-level semantics.
  • Using the SemEval-2021 Task 4 (ReCAM) cloze-style evaluation, it finds that many LLMs (including GPT-4o) under zero-shot, one-shot, and few-shot prompting perform worse on abstract meaning questions than fine-tuned models like BERT and RoBERTa.
  • It introduces a bidirectional attention classifier that more dynamically attends to both the passage and candidate options, aiming to mimic human cognitive strategies for abstraction.
  • The proposed method improves accuracy by 4.06% on Task 1 and 3.41% on Task 2, suggesting architectural changes and attention design can partially mitigate abstract comprehension gaps.

Abstract

Understanding abstract meanings is crucial for advanced language comprehension. Despite extensive research, abstract words remain challenging due to their non-concrete, high-level semantics. SemEval-2021 Task 4 (ReCAM) evaluates models' ability to interpret abstract concepts by presenting passages with questions and five abstract options in a cloze-style format. Key findings include: (1) Most large language models (LLMs), including GPT-4o, struggle with abstract meaning comprehension under zero-shot, one-shot, and few-shot settings, while fine-tuned models like BERT and RoBERTa perform better. (2) A proposed bidirectional attention classifier, inspired by human cognitive strategies, enhances fine-tuned models by dynamically attending to passages and options. This approach improves accuracy by 4.06 percent on Task 1 and 3.41 percent on Task 2, demonstrating its potential for abstract meaning comprehension.