Learning from Natural Language Feedback for Personalized Question Answering
arXiv cs.CL / 4/27/2026
💬 OpinionModels & Research
Key Points
- The paper argues that existing LLM personalization for question answering often depends on RAG plus reinforcement learning with scalar rewards, which the authors say are weak and not very instructive for learning personalization.
- It introduces VAC (a framework for personalized response generation) that replaces scalar rewards with natural language feedback (NLF) generated from user profiles and question narratives.
- Training alternates between optimizing a feedback model and fine-tuning the policy model on the improved responses, ultimately producing a policy that does not need feedback during inference.
- Experiments on the LaMP-QA benchmark across three domains show consistent, significant gains over state-of-the-art methods, and human evaluations indicate higher response quality.
- Overall, the work presents NLF as a richer, more actionable supervision signal for improving both personalization quality and learning efficiency in personalized QA.
Related Articles

The five loops between AI coding and AI engineering
Dev.to

A Machine Learning Model for Stock Market Prediction
Dev.to

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo
MarkTechPost
Three limitations I keep hitting with retrieval-augmented generation in production and I'm running out of ideas [D]
Reddit r/MachineLearning

Anthropic's magic code-sniffer: More Swiss cheese than cheddar, for now
The Register