JI-ADF: Joint-Individual Learning with Adaptive Decision Fusion for Multimodal Skin Lesion Classification

arXiv cs.CV / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces JI-ADF, a trimodal deep learning framework for skin lesion classification that combines dermoscopic images, clinical photos, and structured patient metadata.
It uses joint multimodal representation learning with modality-specific auxiliary supervision and an adaptive decision fusion mechanism that weights each modality dynamically per sample.
An MMFA (multimodal fusion attention) module is added to improve cross-modal reasoning while still preserving modality-specific evidence.
Experiments on the MILK10k benchmark, which simulates real-world clinical capture conditions and heavy class imbalance, show improved sensitivity and Dice score without sacrificing high specificity and calibration.
The authors support the results with modality ablation studies, calibration evaluation, and Grad-CAM visualizations to demonstrate robustness and clinically meaningful behavior.

Abstract

Skin lesion classification is essential for early dermatological diagnosis, yet many existing computer-aided systems rely primarily on dermoscopic images and underutilize the multimodal evidence routinely available in clinical practice. To address this gap, we propose \textbf{JI-ADF}, a trimodal deep learning framework that integrates dermoscopic images, clinical photographs, and structured patient metadata for clinically grounded skin lesion classification. The proposed architecture combines joint multimodal representation learning with modality-specific auxiliary supervision and an adaptive decision fusion mechanism that dynamically calibrates modality contributions on a per-sample basis. To enhance cross-modal reasoning while preserving modality-specific evidence, we further introduce a multimodal fusion attention (MMFA) module. We evaluate JI-ADF on the large-scale MILK10k benchmark, which reflects real-world clinical acquisition conditions and severe class imbalance. The proposed method demonstrates strong and well-balanced performance across lesion categories, improving sensitivity and Dice score while maintaining high specificity and good calibration. Extensive analyses, including modality ablation, calibration evaluation, and Grad-CAM visualization, further confirm the robustness and clinically meaningful behavior of the model. These results indicate that JI-ADF provides a reliable and practical foundation for multimodal skin lesion classification in real-world clinical settings.