CFCML: A Coarse-to-Fine Crossmodal Learning Framework For Disease Diagnosis Using Multimodal Images and Tabular Data
arXiv cs.CV / 3/23/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a coarse-to-fine crossmodal learning (CFCML) framework to reduce the modality gap between medical images and tabular data for disease diagnosis.
- At the coarse stage, it leverages relationships between multi-granularity image features from various encoder stages and tabular information to preliminarily narrow the modality gap.
- At the fine stage, it generates unimodal and crossmodal prototypes with class-aware information and introduces a hierarchical anchor-based relationship mining (HRM) strategy to further extract discriminative crossmodal signals.
- The approach uses modality samples, unimodal prototypes, and crossmodal prototypes as anchors to drive contrastive learning, enhancing inter-class disparity while reducing intra-class disparity from multiple perspectives.
- Experiments on MEN and Derm7pt datasets show AUC improvements of 1.53% and 0.91% respectively, and the code is released at the linked GitHub repository.
Related Articles

Interactive Web Visualization of GPT-2
Reddit r/artificial
Stop Treating AI Interview Fraud Like a Proctoring Problem
Dev.to
[R] Causal self-attention as a probabilistic model over embeddings
Reddit r/MachineLearning
The 5 software development trends that actually matter in 2026 (and what they mean for your startup)
Dev.to
InVideo AI Review: Fast Finished
Dev.to