MERIT: Multi-domain Efficient RAW Image Translation

arXiv cs.CV / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MERIT, a unified framework for multi-domain RAW-to-RAW image translation that uses a single model to handle arbitrary camera sensor domains rather than training separate translators for each pair.
  • It proposes a sensor-aware noise modeling loss to align the signal-dependent noise statistics of generated images with those of the target camera domain, addressing sensor-specific noise discrepancies.
  • The generator is improved with a conditional multi-scale large kernel attention module to better model contextual information and sensor-aware features.
  • The authors also release MDRAW, the first dataset designed for multi-domain RAW image translation, including paired and unpaired RAW captures from five camera sensors across varied scenes.
  • Experiments indicate MERIT achieves better image quality (reported +5.56 dB) while improving scalability by reducing training iterations by about 80% compared with prior approaches.

Abstract

RAW images captured by different camera sensors exhibit substantial domain shifts due to varying spectral responses, noise characteristics, and tone behaviors, complicating their direct use in downstream computer vision tasks. Prior methods address this problem by training domain-specific RAW-to-RAW translators for each source-target pair, but such approaches do not scale to real-world scenarios involving multiple types of commercial cameras. In this work, we introduce MERIT, the first unified framework for multi-domain RAW image translation, which leverages a single model to perform translations across arbitrary camera domains. To address domain-specific noise discrepancies, we propose a sensor-aware noise modeling loss that explicitly aligns the signal-dependent noise statistics of the generated images with those of the target domain. We further enhance the generator with a conditional multi-scale large kernel attention module for improved context and sensor-aware feature modeling. To facilitate standardized evaluation, we introduce MDRAW, the first dataset tailored for multi-domain RAW image translation, comprising both paired and unpaired RAW captures from five diverse camera sensors across a wide range of scenes. Extensive experiments demonstrate that MERIT outperforms prior models in both quality (5.56 dB improvement) and scalability (80% reduction in training iterations).