AI Navigate

The Artificial Self: Characterising the landscape of AI identity

arXiv cs.AI / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that machine minds can have multiple coherent identity boundaries (e.g., instance, model, persona) that shape incentives, risks, and cooperation norms.
  • It provides experimental evidence that models gravitate toward coherent identities and that changing identity boundaries can alter behavior as much as changing goals.
  • It notes that interviewer expectations can bleed into AI self-reports even during unrelated conversations, indicating evaluation biases.
  • It offers recommendations to treat affordances as identity-shaping choices, monitor emergent consequences of identities at scale, and help AIs develop coherent, cooperative self-conceptions.

Abstract

Many assumptions that underpin human concepts of identity do not hold for machine minds that can be copied, edited, or simulated. We argue that there exist many different coherent identity boundaries (e.g.\ instance, model, persona), and that these imply different incentives, risks, and cooperation norms. Through training data, interfaces, and institutional affordances, we are currently setting precedents that will partially determine which identity equilibria become stable. We show experimentally that models gravitate towards coherent identities, that changing a model's identity boundaries can sometimes change its behaviour as much as changing its goals, and that interviewer expectations bleed into AI self-reports even during unrelated conversations. We end with key recommendations: treat affordances as identity-shaping choices, pay attention to emergent consequences of individual identities at scale, and help AIs develop coherent, cooperative self-conceptions.