T-DuMpRa: Teacher-guided Dual-path Multi-prototype Retrieval Augmented framework for fine-grained medical image classification

arXiv cs.AI / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • T-DuMpRa is a teacher-guided, dual-path retrieval-augmented framework designed to improve fine-grained medical image classification where subtle visual differences and ambiguous cases cause miscalibrated predictions.
  • The method jointly trains discriminative classification with multi-prototype retrieval by optimizing cross-entropy and supervised contrastive losses to learn an embedding space compatible with cosine-based prototype matching.
  • It uses an EMA (exponential moving average) teacher to produce smoother representations and builds a multi-prototype memory bank by clustering teacher embeddings in the teacher feature space.
  • During inference, it fuses the classifier’s predicted distribution with a prototype similarity distribution using a conservative confidence-gated strategy that invokes retrieval only when the classifier is uncertain and retrieval evidence is decisive.
  • Experiments on HAM10000 and ISIC2019 show consistent gains (0.68%–0.21% and 0.44%–2.69% respectively) across five backbones, with visualization supporting improved handling of visually ambiguous cases.

Abstract

Fine-grained medical image classification is challenged by subtle inter-class variations and visually ambiguous cases, where confidence estimates often exhibit uncertainty rather than being overconfident. In such scenarios, purely discriminative classifiers may achieve high overall accuracy yet still fail to distinguish between highly similar categories, leading to miscalibrated predictions. We propose T-DuMpRa, a teacher-guided dual-path multi-prototype retrieval-augmented framework, where discriminative classification and multi-prototype retrieval jointly drive both training and prediction. During training, we jointly optimize cross-entropy and supervised contrastive objectives to learn a cosine-compatible embedding geometry for reliable prototype matching. We further employ an exponential moving average (EMA) teacher to obtain smoother representations and build a multi-prototype memory bank by clustering teacher embeddings in the teacher embedding space. Our framework is plug-and-play: it can be easily integrated into existing classification models by constructing a compact prototype bank, thereby improving performance on visually ambiguous cases. At inference, we combine the classifier's predicted distribution with a similarity-based distribution computed via cosine matching to prototypes, and apply a conservative confidence-gated fusion that activates retrieval only when the classifier's prediction is uncertain and the retrieval evidence is decisive and conflicting, otherwise keeping confident predictions unchanged. On HAM10000 and ISIC2019, our method yields 0.68%-0.21% and 0.44%-2.69% improvements on 5 different backbones. And visualization analysis proves our model can enhance the model's ability to handle visually ambiguous cases.