ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs

arXiv cs.CL / 4/9/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

本論文は、WVS（World Values Survey）を基に「文化価値を視覚シーンから条件づけて判断できるか」を評価する新ベンチマークValueGroundを提案している。
ValueGroundでは、元の選択肢テキストを与えず、最小限の差分で対立する画像ペアを用いて、国・質問・画像ペアから最も適合する画像を選ばせる設計になっている。
6つのMLLMと13か国で検証した結果、テキストのみの場合の平均精度72.8%が、選択肢を視覚化した場合は65.8%へ低下し、視覚化による難化が示された。
画像同士の整合（option-image alignment）精度は92.8%と高い一方で、より強いモデルほど頑健だが、全モデルに「予測反転（prediction reversals）」の傾向が残ることが報告されている。
これにより、文化条件づけされた価値判断のクロスモーダルな転移を、制御された形で研究するためのテストベッドが提供される。

Abstract

Cultural values are expressed not only through language but also through visual scenes and everyday social practices. Yet existing evaluations of cultural values in language models are almost entirely text-only, making it unclear whether models can ground culture-conditioned judgments when response options are visualized. We introduce ValueGround, a benchmark for evaluating culture-conditioned visual value grounding in multimodal large language models (MLLMs). Built from World Values Survey (WVS) questions, ValueGround uses minimally contrastive image pairs to represent opposing response options while controlling irrelevant variation. Given a country, a question, and an image pair, a model must choose the image that best matches the country's value tendency without access to the original response-option texts. Across six MLLMs and 13 countries, average accuracy drops from 72.8% in the text-only setting to 65.8% when options are visualized, despite 92.8% accuracy on option-image alignment. Stronger models are more robust, but all remain prone to prediction reversals. Our benchmark provides a controlled testbed for studying cross-modal transfer of culture-conditioned value judgments.

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Moving from proof of concept to production: what we learned with Nometria

Dev.to

Frontend Engineers Are Becoming AI Trainers

Dev.to

ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs

Key Points

Abstract

Related Articles

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

Moving from proof of concept to production: what we learned with Nometria

Frontend Engineers Are Becoming AI Trainers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer