InstanceRSR: Real-World Super-Resolution via Instance-Aware Representation Alignment

arXiv cs.CV / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a key weakness in current real-world super-resolution (RSR) methods: denoising losses like MSE promote global consistency but fail to adequately recover fine-grained, instance-level details in complex scenes.
  • It proposes InstanceRSR, which combines global consistency guidance from low-resolution inputs with semantic relevance enforcement using semantic segmentation maps during sampling.
  • InstanceRSR adds an instance representation learning module that aligns the diffusion latent space with instance latent features for instance-aware feature alignment.
  • It further introduces a scale alignment mechanism aimed at improving fine-grained perception and detail recovery.
  • Experiments on multiple real-world benchmarks show the method achieves new state-of-the-art performance, improving both quantitative metrics and visual quality while preserving semantic consistency at the instance level.

Abstract

Existing real-world super-resolution (RSR) methods based on generative priors have achieved remarkable progress in producing high-quality and globally consistent reconstructions. However, they often struggle to recover fine-grained details of diverse object instances in complex real-world scenes. This limitation primarily arises because commonly adopted denoising losses (e.g., MSE) inherently favor global consistency while neglecting instance-level perception and restoration. To address this issue, we propose InstanceRSR, a novel RSR framework that jointly models semantic information and introduces instance-level feature alignment. Specifically, we employ low-resolution (LR) images as global consistency guidance while jointly modeling image data and semantic segmentation maps to enforce semantic relevance during sampling. Moreover, we design an instance representation learning module to align the diffusion latent space with the instance latent space, enabling instance-aware feature alignment, and further incorporate a scale alignment mechanism to enhance fine-grained perception and detail recovery. Benefiting from these designs, our approach not only generates photorealistic details but also preserves semantic consistency at the instance level. Extensive experiments on multiple real-world benchmarks demonstrate that InstanceRSR significantly outperforms existing methods in both quantitative metrics and visual quality, achieving new state-of-the-art (SOTA) performance.