The Impact of Steering Large Language Models with Persona Vectors in Educational Applications

arXiv cs.CL / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The study finds that activation-based steering using persona vectors can personalize large language model behavior at inference time, but it generally lowers answer quality in educational short-answer generation.
  • Sensitivity to persona steering is much higher for open-ended ELA prompts than for factual science prompts, with interpretive and argumentative tasks up to 11x more sensitive.
  • In automated scoring, steered persona traits produce valence-aligned calibration shifts, where “evil/impolite” scorers grade more harshly and “good/optimistic” scorers grade more leniently.
  • The magnitude of scorer personalization varies by subject and architecture: ELA tasks are 2.5–3x more susceptible than science tasks, and a Mixture-of-Experts model shows about 6x larger calibration shifts than dense models.
  • The authors conclude this is the first systematic examination of activation-steered persona traits in educational generation and scoring and argue for task-aware, architecture-aware calibration before deployment.

Abstract

Activation-based steering can personalize large language models at inference time, but its effects in educational settings remain unclear. We study persona vectors for seven character traits in short-answer generation and automated scoring on the ASAP-SAS benchmark across three models spanning two architectures. Persona steering lowers answer quality overall, with much larger effects on open-ended English Language Arts (ELA) prompts than on factual science prompts; interpretive and argumentative tasks are up to 11x more sensitive. On the scoring side, we observe predictable valence-aligned calibration shifts: evil and impolite scorers grade more harshly, while good and optimistic scorers grade more leniently. ELA tasks are 2.5-3x more susceptible to scorer personalization than science tasks, and the Mixture-of-Experts model shows roughly 6x larger calibration shifts than the dense models. To our knowledge, this is the first study to systematically examine the effects of activation-steered persona traits in educational generation and scoring, and the results highlight the need for task-aware and architecture-aware calibration when deploying steered models in educational settings.