RHVI-FDD: A Hierarchical Decoupling Framework for Low-Light Image Enhancement

arXiv cs.CV / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses low-light image enhancement challenges such as heavy noise, detail loss, and color distortion that degrade downstream multimedia analysis and retrieval.
It proposes RHVI-FDD, a hierarchical decoupling framework that separates luminance and chrominance at a macro level to reduce estimation bias from noisy inputs.
At a micro level, it introduces a Frequency-Domain Decoupling (FDD) module that uses Discrete Cosine Transform to split chrominance features into low/mid/high-frequency bands corresponding to global tone, local details, and noise.
The frequency bands are processed by dedicated expert networks and fused using an adaptive gating module to perform content-aware reconstruction.
Experiments on multiple low-light datasets show consistent improvements over existing state-of-the-art methods in both objective metrics and subjective visual quality.

Abstract

Low-light images often suffer from severe noise, detail loss, and color distortion, which hinder downstream multimedia analysis and retrieval tasks. The degradation in low-light images is complex: luminance and chrominance are coupled, while within the chrominance, noise and details are deeply entangled, preventing existing methods from simultaneously correcting color distortion, suppressing noise, and preserving fine details. To tackle the above challenges, we propose a novel hierarchical decoupling framework (RHVI-FDD). At the macro level, we introduce the RHVI transform, which mitigates the estimation bias caused by input noise and enables robust luminance-chrominance decoupling. At the micro level, we design a Frequency-Domain Decoupling (FDD) module with three branches for further feature separation. Using the Discrete Cosine Transform, we decompose chrominance features into low, mid, and high-frequency bands that predominantly represent global tone, local details, and noise components, which are then processed by tailored expert networks in a divide-and-conquer manner and fused via an adaptive gating module for content-aware fusion. Extensive experiments on multiple low-light datasets demonstrate that our method consistently outperforms existing state-of-the-art approaches in both objective metrics and subjective visual quality.