Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs

arXiv cs.CL / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLMs being able to produce multilingual text does not imply they can perform culture-aware reasoning, especially in creative tasks tied to cultural conceptual frameworks.
  • It presents a computational audit using a metaphor generation benchmark across five cultural settings and multiple abstract concepts to test whether LLMs behave as culturally diverse partners or as “translators” anchored in a dominant (not culture-specific) framework.
  • The empirical results show stereotyped metaphor patterns for certain cultural settings and evidence of “Western defaultism.”
  • The authors conclude that adding a cultural identity to prompts is insufficient to guarantee culturally grounded reasoning, indicating a need for more robust evaluation and mitigation of cultural bias.

Abstract

Large language models (LLMs) are often described as multilingual because they can understand and respond in many languages. However, speaking a language is not the same as reasoning within a culture. This distinction motivates a critical question: do LLMs truly conduct culture-aware reasoning? This paper presents a preliminary computational audit of cultural inclusivity in a creative writing task. We empirically examine whether LLMs act as culturally diverse creative partners or merely as cultural translators that leverage a dominant conceptual framework with localized expressions. Using a metaphor generation task spanning five cultural settings and several abstract concepts as a case study, we find that the model exhibits stereotyped metaphor usage for certain settings, as well as Western defaultism. These findings suggest that merely prompting an LLM with a cultural identity does not guarantee culturally grounded reasoning.