Unpaired Image Deraining Using Reward-Guided Self-Reinforcement Strategy

arXiv cs.CV / 5/4/2026

📰 NewsModels & Research

Key Points

  • The paper proposes RGSUD (Reward-Guided Self-Reinforcement Unsupervised Image Deraining) to improve unsupervised deraining by learning real-world rain degradation without paired supervision.
  • It introduces an IQA-based dynamic reward recycling stage that selects high-quality derained outputs during training and continually builds a set of pseudo clean examples.
  • A self-reinforcement (SR) training stage then incorporates these dynamically updated rewards into optimization, narrowing the search space and aligning derained results with clean images.
  • Experiments across paired synthetic, paired real, and unpaired real datasets show state-of-the-art performance versus existing unsupervised deraining methods on both subjective quality and objective IQA metrics.
  • The authors also report that the self-reinforcement strategy can be adapted to other unsupervised deraining methods and that their framework generalizes well to existing supervised deraining networks.

Abstract

Unsupervised deraining has attracted attention for its ability to learn the real-world distribution of rain without paired supervision. However, the lack of strong constraints makes it difficult for the network to converge, especially with the complex diversity of rain degradation. A key motivation is that high-quality deraining results occasionally emerge during training, which can be leveraged to guide the optimization process. To overcome these challenges, we introduce RGSUD (Reward-Guided Self-Reinforcement Unsupervised Image Deraining), comprising two key stages: reward recycling and self-reinforcement (SR) training. For the former stage, we propose an Image Quality Assessment (IQA)-based dynamic reward recycling mechanism that selects optimal derained outputs during training and continuously collects high-quality deraining images. In latter stage, we incorporate these rewards into the model's optimization process, constraining the optimization space and improving alignment between derained outputs and clean images. By leveraging IQA-based self-reinforced loss and dynamically updated rewards, we enhance the quality of synthesized pseudo-paired data and stabilize the optimization. Extensive experiments demonstrate that our method achieves SOTA performance across multiple datasets, including paired synthetic, paired real, and unpaired real images, outperforming existing unsupervised deraining approaches in both subjective and objective IQA metrics. Additionally, we show that the self-reinforcement strategy is adaptable to other unsupervised deraining methods and our deraining framework demonstrates strong generalization across existing supervised deraining networks.