Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives

arXiv cs.CL / 4/29/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes demonstratives (e.g., “this/that” and Chinese “zhe/na”) as a new probe to test whether LLMs can capture grounded, embodied cognition and culture-specific conventions from text.
  • Using 6,400 responses from 320 native speakers, the study establishes language-specific human baselines: English speakers distinguish distance reliably but struggle with perspective-taking, while Chinese speakers handle perspective shifts more fluently but accept more distal ambiguity.
  • Five state-of-the-art LLMs do not reproduce the human proximal–distal understanding and show no cultural differences, indicating English-centric default reasoning rather than culturally grounded interpretation.
  • The authors argue the results inform the egocentric–sociocentric debate and emphasize accounting for individual variation in future model design.
  • The contribution includes a new evaluation task and empirical evidence of cross-linguistic asymmetries in how people interpret spatial expressions.

Abstract

Do large language models (LLMs) truly acquire embodied cognition and cultural conventions from text? We introduce demonstratives, fundamental spatial expressions like "this/that" in English and "zh\`e/n\`a" in Chinese, as a novel probe for grounded knowledge. Using 6,400 responses from 320 native speakers, we establish a human baseline: English speakers reliably distinguish proximal-distal referents but struggle with perspective-taking, while Chinese speakers switch perspectives fluently but tolerate distal ambiguity. In contrast, five state-of-the-art LLMs fail to inherently understand the proximal-distal contrast and show no cultural differences, defaulting to English-centric reasoning. Our study contributes (i) a new task, based on demonstratives, as a new lens for evaluating embodied cognition and cultural conventions; (ii) empirical evidence of cross-cultural asymmetries in human interpretation; (iii) a new perspective on the egocentric-sociocentric debate, showing both orientations coexist but vary across languages; and (iv) a call to address individual variation in future model design.