PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs
arXiv cs.AI / 5/5/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- PERSA is a reinforcement learning from human feedback (RLHF) pipeline for generating programming feedback in a specific professor’s grading voice while preserving diagnostic correctness.
- The method combines supervised fine-tuning on professor demonstrations, reward modeling from pairwise preferences, and PPO, with learning deliberately constrained to style-bearing parts of the transformer.
- By updating only the top transformer blocks and their feed-forward projections (using parameter-efficient fine-tuning), PERSA reduces global parameter drift and improves stylistic controllability.
- Experiments on APPS, PyFiXV, and CodeReviewQA show strong professor-style transfer across Llama-3 and Gemma-2 backbones, including large gains in style alignment with correctness accuracy remaining very high.
- The work positions PERSA as a practical approach for personalized educational feedback that aligns both “what to say” (content accuracy) and “how to say it” (tone and structure).
Related Articles

Black Hat USA
AI Business

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents
Dev.to

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool
Dev.to
AI is getting better at doing things, but still bad at deciding what to do?
Reddit r/artificial

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny
Dev.to