Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception

arXiv cs.CL / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study evaluates whether persona prompting meaningfully and reproducibly diversifies multimodal LLM agents’ urban sentiment judgments when analyzing images from the PerceptSent dataset.
Agents show strong within-persona behavioral consistency across multiple instantiations, indicating stable and reproducible behavior under the same persona.
Cross-persona differentiation is limited: economic status and personality produce only modest, statistically detectable differences, while gender has no measurable effect and political orientation has negligible impact.
The agents exhibit an extremity bias that collapses intermediate sentiment categories, yielding good performance on coarse polarity tasks but worse results as sentiment granularity increases.
A follow-up test with the same model without personas sometimes matches or exceeds persona-conditioned agreement with human labels, implying that label-based persona prompting may provide limited annotation value in this setup.

Abstract

Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting produces meaningful and reproducible behavioral diversity. We investigate whether distinct personas influence urban sentiment judgments generated by multimodal LLMs. Using a factorial set of personas spanning gender, economic status, political orientation, and personality, we instantiate multiple agents per persona to evaluate urban scene images from the PerceptSent dataset and assess both within-persona consistency and cross-persona variation. Results show strong convergence among agents sharing a persona, indicating stable and reproducible behavior. However, cross-persona differentiation is limited: economic status and personality induce statistically detectable but practically modest variation, while gender shows no measurable effect and political orientation only negligible impact. Agents also exhibit an extremity bias, collapsing intermediate sentiment categories common in human annotations. As a result, performance remains strong on coarse-grained polarity tasks but degrades as sentiment resolution increases, suggesting that simple label-based persona prompting does not capture fine-grained perceptual judgments. To isolate the contribution of persona conditioning, we additionally evaluate the same model without personas. Surprisingly, the no-persona model sometimes matches or exceeds persona-conditioned agreement with human labels across all task variants, suggesting that simple label-based persona prompting may add limited annotation value in this setting.