Combined Dictionary Unfolding Network with Gradient-Adaptive Fidelity for Transferable Multi-Source Fusion

arXiv cs.CV / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes CDNet, a lightweight Combined Dictionary Unfolding Network aimed at efficient multi-source image fusion, especially on resource-constrained edge devices.
  • Unlike prior deep unfolding approaches based on alternating minimization that update modalities separately, CDNet uses a structurally constrained joint unfolding architecture derived from coupled dictionary learning’s unique-common decomposition prior.
  • CDNet’s CDBlock employs block-sparse interaction topology and performs joint model-derived updates for common and modality-specific representations to reduce computational and memory overhead.
  • The authors introduce a compact High- and Low-frequency Image Fidelity loss to enable unsupervised training without ground-truth images.
  • Experiments across four fusion tasks (multi-exposure, infrared-visible, medical, and infrared-visible for semantic segmentation) show competitive or better performance, including PSNR gains of 1.23 dB (TNO) and 1.59 dB (RoadScene) over the second-best method in specific settings.

Abstract

Deep Unfolding Network-based methods have emerged as effective solutions for multi-source image fusion by combining model-driven iterative optimization with data-driven deep learning. However, most existing deep unfolding image fusion methods are derived from alternating minimization, which updates the features of different modalities separately. This design introduces considerable computational and memory overhead, limiting deployment on resource-constrained edge devices. To address this issue, we propose CDNet, a lightweight Combined Dictionary Unfolding Network for multi-source image fusion. Rather than introducing a new sparse coding prior or empirically compressing an existing fusion network, CDNet translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture. The resulting CDBlock follows a block-sparse interaction topology and performs a model-derived joint update of common and modality-specific representations, thereby streamlining feature learning and improving efficiency.In addition, we design a compact High- and Low-frequency Image Fidelity loss for unsupervised training without ground-truth images. We evaluate CDNet on four tasks, including multi-exposure image fusion, infrared and visible image fusion, medical image fusion, and infrared and visible image fusion for semantic segmentation. Experimental results show that CDNet achieves competitive or superior fusion performance with high efficiency. For infrared and visible image fusion, CDNet outperforms competing methods on four of six metrics on the TNO dataset and five of six metrics on the RoadScene dataset. In particular, it surpasses the second-best method by 1.23 dB and 1.59 dB in PSNR on TNO and RoadScene, respectively.