NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Professional Image Quality Assessment (Track 1)

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • NTIRE 2026 hosted the 3rd Restore Any Image Model (RAIM) challenge, focusing on Track 1 for Professional Image Quality Assessment (PIQA) in-the-wild.
  • The paper argues that conventional IQA methods using scalar scores struggle to capture subtle differences and cannot provide the “why” needed for actionable vision guidance.
  • To address this, the challenge benchmarked Multimodal Large Language Models (MLLMs) for human-expert-like evaluation of image pairs, requiring both comparative selection and interpretative, grounded reasoning.
  • Participants were evaluated on (1) choosing the better image in a high-quality pair and (2) producing expert-level explanations, with nearly 200 registrations and 2,500+ submissions.
  • The dataset and challenge resources were released publicly, and the results reportedly advanced the state of the art in professional IQA.
  • The challenge was coordinated via CodaBench and the dataset is hosted on GitHub, enabling reuse for future research.
  • categories includes models-research to reflect the research/benchmark nature of the announcement.

Abstract

In this paper, we present an overview of the NTIRE 2026 challenge on the 3rd Restore Any Image Model in the Wild, specifically focusing on Track 1: Professional Image Quality Assessment. Conventional Image Quality Assessment (IQA) typically relies on scalar scores. By compressing complex visual characteristics into a single number, these methods fundamentally struggle to distinguish subtle differences among uniformly high-quality images. Furthermore, they fail to articulate why one image is superior, lacking the reasoning capabilities required to provide guidance for vision tasks. To bridge this gap, recent advancements in Multimodal Large Language Models (MLLMs) offer a promising paradigm. Inspired by this potential, our challenge establishes a novel benchmark exploring the ability of MLLMs to mimic human expert cognition in evaluating high-quality image pairs. Participants were tasked with overcoming critical bottlenecks in professional scenarios, centering on two primary objectives: (1) Comparative Quality Selection: reliably identifying the visually superior image within a high-quality pair; and (2) Interpretative Reasoning: generating grounded, expert-level explanations that detail the rationale behind the selection. In total, the challenge attracted nearly 200 registrations and over 2,500 submissions. The top-performing methods significantly advanced the state of the art in professional IQA. The challenge dataset is available at https://github.com/narthchin/RAIM-PIQA, and the official homepage is accessible at https://www.codabench.org/competitions/12789/.