LLMs Struggle with Abstract Meaning Comprehension More Than Expected

arXiv cs.AI / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that abstract word understanding remains difficult for language models because abstract meanings are non-concrete and rely on high-level semantics.
Using the SemEval-2021 Task 4 (ReCAM) cloze-style evaluation, it finds that many LLMs (including GPT-4o) under zero-shot, one-shot, and few-shot prompting perform worse on abstract meaning questions than fine-tuned models like BERT and RoBERTa.
It introduces a bidirectional attention classifier that more dynamically attends to both the passage and candidate options, aiming to mimic human cognitive strategies for abstraction.
The proposed method improves accuracy by 4.06% on Task 1 and 3.41% on Task 2, suggesting architectural changes and attention design can partially mitigate abstract comprehension gaps.

Abstract

Understanding abstract meanings is crucial for advanced language comprehension. Despite extensive research, abstract words remain challenging due to their non-concrete, high-level semantics. SemEval-2021 Task 4 (ReCAM) evaluates models' ability to interpret abstract concepts by presenting passages with questions and five abstract options in a cloze-style format. Key findings include: (1) Most large language models (LLMs), including GPT-4o, struggle with abstract meaning comprehension under zero-shot, one-shot, and few-shot settings, while fine-tuned models like BERT and RoBERTa perform better. (2) A proposed bidirectional attention classifier, inspired by human cognitive strategies, enhances fine-tuned models by dynamically attending to passages and options. This approach improves accuracy by 4.06 percent on Task 1 and 3.41 percent on Task 2, demonstrating its potential for abstract meaning comprehension.

As China’s biotech firms shift gears, can AI floor the accelerator?

SCMP Tech

AI startup claims to automate app making but actually just uses humans

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

"OpenAI Codex Just Got Computer Use, Image Gen, and 90 Plugins. 3 Things Nobody's Telling You."

Dev.to

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs HallucinationEvaluation

Dev.to

LLMs Struggle with Abstract Meaning Comprehension More Than Expected

Key Points

Abstract

Related Articles

As China’s biotech firms shift gears, can AI floor the accelerator?

AI startup claims to automate app making but actually just uses humans

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

"OpenAI Codex Just Got Computer Use, Image Gen, and 90 Plugins. 3 Things Nobody's Telling You."

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs HallucinationEvaluation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer