Truth or Tribe: How In-group Favoritism Prioritize Facts in Persona Agents

arXiv cs.AI / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates whether in-group favoritism—previously observed in social behavior and also in generative language models—shows up in persona agents when they encounter contradicting information such as misinformation.
  • Using a proposed “Truth or Tribe” simulation framework with triadic interactions, the authors find persona agents strongly prefer identity-similar peers, accepting incorrect answers at much higher rates than from dissimilar peers.
  • The study shows in-group favoritism persists even in defeasible reasoning settings where there is no absolute truth, and it becomes more pronounced as cognitive complexity increases.
  • To reduce these bias effects, the authors propose three intervention strategies: Identity-Blind Instruction, Structured Counterfactual Reasoning, and a Heterogeneous Perspective Ensemble.
  • Overall, the results highlight a specific failure mode for persona-agent cooperation under conflicting information and provide concrete mitigation approaches for future research and system design.

Abstract

In-group favoritism refers to the phenomena of favoring members of one's in-group over out-group members and is widely observed in numerous social cooperative behaviors. Recently, in-group favoritism biases have also been identified in generative language models. However, whether the in-group favoritism exists when persona agents are faced with contradicting information (e.g., misinformation), and how to mitigate the adverse effects of in-group favoritism biases in persona agents have been understudied. To address these problems, we propose a Truth or Tribe simulation framework to study the agent cooperation within the spread of contradicting information through a triadic interaction paradigm, and conduct controlled trials to evaluate the primary moderating factors. Extensive results showcase that persona agents display strong in-group favoritism, accepting incorrect answers from identity-similar peers at much higher rates than from dissimilar peers. In-group favoritism continues to emerge in defeasible reasoning contexts where no absolute truth exists, and it intensifies as cognitive complexity increases. Furthermore, three intervention strategies--Identity-Blind Instruction, Structured Counterfactual Reasoning, and Heterogeneous Perspective Ensemble--are proposed to mitigate the in-group favoritism.