When Meaning Isn't Literal: Exploring Idiomatic Meaning Across Languages and Modalities

arXiv cs.CL / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that current language models often struggle with idiomatic and culturally grounded meaning because they over-rely on surface-level lexical/semantic cues rather than metaphorical intent.
It presents “Mediom,” a multilingual, multimodal corpus covering 3,533 Hindi, Bengali, and Thai idioms, with gold-standard explanations, cross-lingual translations, and aligned text–image representations to enable figurative disambiguation evaluation.
The authors benchmark both large language models (textual reasoning) and vision-language models on Mediom, finding systematic failures in metaphor and idiom comprehension.
To address these gaps, they propose “HIDE,” a hinting-based idiom explanation framework that uses error-feedback retrieval and diagnostic cues for iterative reasoning improvement.
Overall, Mediom and HIDE are positioned as a rigorous test bed and methodology for building next-generation AI systems capable of culturally grounded, multimodal idiom understanding.

Abstract

Idiomatic reasoning, deeply intertwined with metaphor and culture, remains a blind spot for contemporary language models, whose progress skews toward surface-level lexical and semantic cues. For instance, the Bengali idiom \textit{\foreignlanguage{bengali}{\char"0986\char"0999\char"09CD\char"0997\char"09C1 \char"09B0 \char"09AB\char"09B2 \char"099F\char"0995}} (angur fol tok, ``grapes are sour''): it encodes denial-driven rationalization, yet naive models latch onto the literal fox-and-grape imagery. Addressing this oversight, we present ``Mediom,'' a multilingual, multimodal idiom corpus of 3,533 Hindi, Bengali, and Thai idioms, each paired with gold-standard explanations, cross-lingual translations, and carefully aligned text--image representations. We benchmark both large language models (textual reasoning) and vision-language models (figurative disambiguation) on Mediom, exposing systematic failures in metaphor comprehension. To mitigate these gaps, we propose ``HIDE,'' a Hinting-based Idiom Explanation framework that leverages error-feedback retrieval and targeted diagnostic cues for iterative reasoning refinement. Collectively, Mediom and HIDE establish a rigorous test bed and methodology for culturally grounded, multimodal idiom understanding embedded with reasoning hints in next-generation AI systems.

Black Hat Asia

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Don't forget, there is more than forgetting: new metrics for Continual Learning

Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Dev.to

Bit of a strange question?

Reddit r/artificial

When Meaning Isn't Literal: Exploring Idiomatic Meaning Across Languages and Modalities

Key Points

Abstract

Related Articles

Black Hat Asia

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Don't forget, there is more than forgetting: new metrics for Continual Learning

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Bit of a strange question?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer