REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge
arXiv cs.LG / 3/19/2026
📰 NewsModels & Research
Key Points
- REAL introduces a regression-aware RL framework for LLM evaluation that optimizes regression-based rewards instead of binary signals.
- It tackles the policy-dependency of regression objectives by using a generalized policy gradient estimator, decomposing optimization into exploration over Chain-of-Thought trajectories and regression-aware score refinement.
- Experimental results across model scales (8B to 32B) show REAL consistently outperforming regression-aware SFT baselines and standard RL methods, with notable gains on Qwen3-32B (Pearson +8.40, Spearman +7.20).
- The findings highlight improved generalization to out-of-domain benchmarks and demonstrate the value of integrating regression objectives into RL exploration for more accurate LLM evaluation.
Related Articles

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**
Dev.to

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.
Reddit r/LocalLLaMA

composer 2 is just Kimi K2.5 with RL?????
Reddit r/LocalLLaMA

Built a small free iOS app to reduce LLM answer uncertainty with multiple models
Dev.to
![[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results](/_next/image?url=https%3A%2F%2Fpreview.redd.it%2Flv7w6809f7qg1.png%3Fwidth%3D140%26height%3D75%26auto%3Dwebp%26s%3De77e7b54776d5a33eb092415d26190352ad20577&w=3840&q=75)
[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results
Reddit r/MachineLearning