Facet-Level Persona Control by Trait-Activated Routing with Contrastive SAE for Role-Playing LLMs
arXiv cs.CL / 3/30/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a training-efficient method for role-playing agent persona control using contrastively trained Sparse AutoEncoders (SAEs) that learn facet-level personality vectors aligned to the Big Five 30-facet model.
- Instead of relying on prompt/RAG signals that can dilute over long dialogues or requiring persona-labeled supervised fine-tuning, it introduces trait-activated routing to dynamically select the relevant personality facets during generation.
- The authors construct a leakage-controlled dataset of 15,000 samples with balanced supervision across facets, enabling the SAE to learn interpretable control vectors.
- Experiments on LLMs indicate improved and more stable character fidelity and consistent output quality versus Contrastive Activation Addition (CAA) and prompt-only baselines, with SAE+Prompt performing best.
- The work provides a dataset publicly on GitHub, supporting reproducibility and further research into controllable persona steering for RPAs.




