Reinforcement Learning with Conditional Expectation Reward
arXiv cs.LG / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces Conditional Expectation Reward (CER) to extend reinforcement learning with verifiable rewards by using the LLM itself as an implicit verifier, avoiding handcrafted external rules.
- CER defines the reward as the expected likelihood of generating the reference answer conditioned on the generated answer, providing a soft, graded feedback signal rather than binary checks.
- This approach removes the need for external verifiers or auxiliary models and broadens applicability from math to general reasoning tasks.
- Experimental results show CER is effective across both mathematical and general-domain reasoning tasks, indicating its flexibility as a verification mechanism.
- Code implementing CER is available at https://github.com/changyi7231/CER.
Related Articles

The programming passion is melting
Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA