Large language models perceive cities through a culturally uneven baseline

arXiv cs.CL / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study tests whether frontier large language models describe cities in a culturally neutral way by using a balanced global street-view sample and culturally neutral vs. regionally prompted queries.
  • Results show the “neutral” prompting condition is not actually neutral: outputs tied to Europe and North America stay systematically closer to an underlying baseline than many non-Western prompts.
  • Cultural prompting changes not only descriptive judgments but also affective evaluations, including sentiment-based ingroup preference for certain prompted identities.
  • Even when culturally closer prompting improves alignment with human descriptions, it fails to fully recover human semantic diversity and often retains an affectively elevated style; similar partial reproduction occurs in structured judgments (e.g., safety, beauty, wealth, and well-being-related impressions).
  • Overall, the paper argues LLMs perceive cities through a culturally uneven reference frame rather than a universal standpoint, shaping what feels ordinary, familiar, and positively valued.

Abstract

Large language models (LLMs) are increasingly used to describe, evaluate and interpret places, yet it remains unclear whether they do so from a culturally neutral standpoint. Here we test urban perception in frontier LLMs using a balanced global street-view sample and prompts that either remain neutral or invoke different regional cultural standpoints. Across open-ended descriptions and structured place judgments, the neutral condition proved not to be neutral in practice. Prompts associated with Europe and Northern America remained systematically closer to the baseline than many non-Western prompts, indicating that model perception is organized around a culturally uneven reference frame rather than a universal one. Cultural prompting also shifted affective evaluation, producing sentiment-based ingroup preference for some prompted identities. Comparisons with regional human text-image benchmarks showed that culturally proximate prompting could improve alignment with human descriptions, but it did not recover human levels of semantic diversity and often preserved an affectively elevated style. The same asymmetry reappeared in structured judgments of safety, beauty, wealth, liveliness, boredom and depression, where model outputs were interpretable but only partly reproduced human group differences. These findings suggest that LLMs do not simply perceive cities from nowhere: they do so through a culturally uneven baseline that shapes what appears ordinary, familiar and positively valued.