RefReward-SR: LR-Conditioned Reward Modeling for Preference-Aligned Super-Resolution
arXiv cs.CV / 3/26/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- RefReward-SR is introduced as an LR-conditioned reward model for preference-aligned super-resolution that addresses misalignment between existing SR evaluation metrics and human perceptual preferences.
- Instead of using ground-truth supervision or no-reference metrics, RefReward-SR scores candidate HR reconstructions conditioned on their LR inputs (as a semantic anchor) to better reflect semantic consistency and perceptual plausibility.
- The approach leverages visual-linguistic priors from a multimodal large language model (MLLM) and performs reasoning-aware evaluation of HR outputs relative to their LR conditioning.
- To enable this training paradigm, the authors create RefSR-18K, described as the first large-scale LR-conditioned preference dataset for SR, with pairwise rankings based on LR–HR consistency and HR naturalness.
- The method fine-tunes the MLLM using Group Relative Policy Optimization (GRPO) with LR-conditioned ranking rewards and incorporates GRPO into SR model training using RefReward-SR as the core reward signal, yielding improved alignment with human judgments; code/models/data are planned after acceptance.
Related Articles
Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.
Mistral AI Blog
Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)
Dev.to
Anyone who has any common sense knows that AI agents in marketing just don’t exist.
Dev.to
How to Use MiMo V2 API for Free in 2026: Complete Guide
Dev.to
The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context
Dev.to