Where is the Mind? Persona Vectors and LLM Individuation

arXiv cs.CL / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses the “individuation problem” for large language models by asking which associated entities, if any, should be identified as having “minds.”
It proposes mechanistic interpretability as the framework for analyzing recent findings on persona vectors, persona space, and emergent misalignment.
The authors argue that three views are the strongest candidates: the virtual instance view, the (virtual) instance-persona view, and the model-persona view.
They justify the virtual instance view by claiming that attention streams can maintain quasi-psychological connections across token time.
They review persona-related hypotheses about internal structure and conclude that persona-based views offer promising alternatives to the virtual instance perspective.

Abstract

The individuation problem for large language models asks which entities associated with them, if any, should be identified as minds. We approach this problem through mechanistic interpretability, engaging in particular with recent empirical work on persona vectors, persona space, and emergent misalignment. We argue that three views are the strongest candidates: the virtual instance view and two new views we introduce, the (virtual) instance-persona view and the model-persona view. First, we argue for the virtual instance view on the grounds that attention streams sustain quasi-psychological connections across token-time. Then we present the persona literature, organised around three hypotheses about the internal structure underlying personas in LLMs, and show that the two persona-based views are promising alternatives.