JI-ADF: Joint-Individual Learning with Adaptive Decision Fusion for Multimodal Skin Lesion Classification

arXiv cs.CV / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces JI-ADF, a trimodal deep learning framework for skin lesion classification that combines dermoscopic images, clinical photos, and structured patient metadata.
  • It uses joint multimodal representation learning with modality-specific auxiliary supervision and an adaptive decision fusion mechanism that weights each modality dynamically per sample.
  • An MMFA (multimodal fusion attention) module is added to improve cross-modal reasoning while still preserving modality-specific evidence.
  • Experiments on the MILK10k benchmark, which simulates real-world clinical capture conditions and heavy class imbalance, show improved sensitivity and Dice score without sacrificing high specificity and calibration.
  • The authors support the results with modality ablation studies, calibration evaluation, and Grad-CAM visualizations to demonstrate robustness and clinically meaningful behavior.

Abstract

Skin lesion classification is essential for early dermatological diagnosis, yet many existing computer-aided systems rely primarily on dermoscopic images and underutilize the multimodal evidence routinely available in clinical practice. To address this gap, we propose \textbf{JI-ADF}, a trimodal deep learning framework that integrates dermoscopic images, clinical photographs, and structured patient metadata for clinically grounded skin lesion classification. The proposed architecture combines joint multimodal representation learning with modality-specific auxiliary supervision and an adaptive decision fusion mechanism that dynamically calibrates modality contributions on a per-sample basis. To enhance cross-modal reasoning while preserving modality-specific evidence, we further introduce a multimodal fusion attention (MMFA) module. We evaluate JI-ADF on the large-scale MILK10k benchmark, which reflects real-world clinical acquisition conditions and severe class imbalance. The proposed method demonstrates strong and well-balanced performance across lesion categories, improving sensitivity and Dice score while maintaining high specificity and good calibration. Extensive analyses, including modality ablation, calibration evaluation, and Grad-CAM visualization, further confirm the robustness and clinically meaningful behavior of the model. These results indicate that JI-ADF provides a reliable and practical foundation for multimodal skin lesion classification in real-world clinical settings.