Expected Reward Prediction, with Applications to Model Routing
arXiv cs.CL / 2026/3/24
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper studies how response-level reward models can be lifted to predict an LLM model’s expected reward for a prompt before generating responses, enabling pre-routing decisions.
- It shows that expected reward prediction (ERP) can be both precise and discriminative, supporting an inference-time model routing protocol that optimizes reward while controlling compute costs.
- The proposed ERP-based routing is evaluated on the open-perfectblend dataset using a pool of Llama 3.1 Instruct and Gemma Instruct models, where it outperforms simpler baselines that choose the best average-performing model per prompt category.
- The approach is presented as explaining why more complex routing methods work (they effectively estimate expected reward) and is described as easy to extend when new models are added to the routing pool.