Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models
arXiv cs.CL / 2026/3/24
📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper introduces Fast-Slow Thinking Reward Models (F/S-RM) to better align LLMs by combining efficient Scalar Reward Models (SRMs) with more accurate Generative Reward Models (GRMs).
- F/S-RM uses a dual-confidence activation mechanism to decide when to switch from fast, first-token scalar scoring to slow, chain-of-thought (CoT) based judgment.
- The approach is framed as a hybrid inspired by Dual Process Theory, training a single model to integrate both reward paradigms.
- Experimental results report a 1.2% relative performance improvement over state-of-the-art reward model approaches while cutting token consumption by 20.8%.
- The authors state that code and data will be made publicly available.