Persona Vectors in Games: Measuring and Steering Strategies via Activation Vectors
arXiv cs.AI / 3/24/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes using activation steering and contrastive activation addition to build “persona vectors” in game-theoretic settings, targeting traits such as altruism, forgiveness, and expectations of others.
- Experiments on canonical games show that steering with these vectors can reliably shift both the models’ strategic decisions and their accompanying natural-language justifications.
- The study finds cases where rhetorical justifications and actual strategy diverge under steering, indicating that persona control is not perfectly aligned across output modalities.
- It also reports partial distinctness between vectors for self-behavior and for expectations about others, suggesting different mechanistic subspaces within the model.
- Overall, the authors argue that persona vectors provide a promising mechanistic handle for high-level behavioral traits of LLMs used as autonomous decision-makers in strategic environments.
Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4
Dev.to
How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis
Dev.to
AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?
Dev.to
[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly
Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team
THE DECODER