REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge
arXiv cs.LG / 3/19/2026
📰 NewsModels & Research
Key Points
- REAL introduces a regression-aware RL framework for LLM evaluation that optimizes regression-based rewards instead of binary signals.
- It tackles the policy-dependency of regression objectives by using a generalized policy gradient estimator, decomposing optimization into exploration over Chain-of-Thought trajectories and regression-aware score refinement.
- Experimental results across model scales (8B to 32B) show REAL consistently outperforming regression-aware SFT baselines and standard RL methods, with notable gains on Qwen3-32B (Pearson +8.40, Spearman +7.20).
- The findings highlight improved generalization to out-of-domain benchmarks and demonstrate the value of integrating regression objectives into RL exploration for more accurate LLM evaluation.
Related Articles
Self-Refining Agents in Spec-Driven Development
Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop
Reddit r/LocalLLaMA

M2.7 open weights coming in ~2 weeks
Reddit r/LocalLLaMA

MiniMax M2.7 Will Be Open Weights
Reddit r/LocalLLaMA
Best open source coding models for claude code? LB?
Reddit r/LocalLLaMA