Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space

arXiv cs.AI / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates whether a persistent agent “cognitive_core” identity document (identity prompt) induces attractor-like dynamics in an LLM’s activation space.
  • Using controlled comparisons on Llama 3.1 8B Instruct (original core vs paraphrases vs structurally matched controls), mean-pooled hidden states at layers 8, 16, and 24 show paraphrases converge to a significantly tighter cluster than controls.
  • Replication on Gemma 2 9B supports cross-architecture generalizability, suggesting the effect is not limited to a single model family.
  • Ablation results indicate the phenomenon is driven mainly by semantic content rather than structural matching, and structural completeness is needed to reach the attractor region.
  • An exploratory test shows that merely reading a scientific description of the agent shifts activations toward the attractor more than a sham preprint, implying a difference between “knowing about” an identity and “operating as” that identity.

Abstract

Large language models map semantically related prompts to similar internal representations -- a phenomenon interpretable as attractor-like dynamics. We ask whether the identity document of a persistent cognitive agent (its cognitive_core) exhibits analogous attractor-like behavior. We present a controlled experiment on Llama 3.1 8B Instruct, comparing hidden states of an original cognitive_core (Condition A), seven paraphrases (Condition B), and seven structurally matched controls (Condition C). Mean-pooled states at layers 8, 16, and 24 show that paraphrases converge to a tighter cluster than controls (Cohen's d > 1.88, p < 10^{-27}, Bonferroni-corrected). Replication on Gemma 2 9B confirms cross-architecture generalizability. Ablations suggest the effect is primarily semantic rather than structural, and that structural completeness appears necessary to reach the attractor region. An exploratory experiment shows that reading a scientific description of the agent shifts internal state toward the attractor -- closer than a sham preprint -- distinguishing knowing about an identity from operating as that identity. These results provide representational evidence that agent identity documents induce attractor-like geometry in LLM activation space.