AI Navigate

Every Error has Its Magnitude: Asymmetric Mistake Severity Training for Multiclass Multiple Instance Learning

arXiv cs.CV / 3/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Introduces a mistake-severity-aware training strategy for multiclass MIL to address clinically critical errors in whole slide image diagnosis.
  • Builds a hierarchical class structure and optimizes severity-weighted cross-entropy losses to penalize high-severity misclassifications more strongly.
  • Enforces hierarchical consistency via probabilistic alignment and applies a semantic feature remix to the instance bag to improve class priority and support multi-symptom clinical cases.
  • Proposes an asymmetric Mikel's Wheel-based metric to quantify error severity in medical domains and demonstrates reduced critical errors with demonstrated generalization to non-medical data.

Abstract

Multiple Instance Learning (MIL) has emerged as a promising paradigm for Whole Slide Image (WSI) diagnosis, offering effective learning with limited annotations. However, existing MIL frameworks overlook diagnostic priorities and fail to differentiate the severity of misclassifications in multiclass, leaving clinically critical errors unaddressed. We propose a mistake-severity-aware training strategy that organizes diagnostic classes into a hierarchical structure, with each level optimized using a severity-weighted cross-entropy loss that penalizes high-severity misclassifications more strongly. Additionally, hierarchical consistency is enforced through probabilistic alignment, a semantic feature remix applied to the instance bag to robustly train class priority and accommodate clinical cases involving multiple symptoms. An asymmetric Mikel's Wheel-based metric is also introduced to quantify the severity of errors specific to medical fields. Experiments on challenging public and real-world in-house datasets demonstrate that our approach significantly mitigates critical errors in MIL diagnosis compared to existing methods. We present additional experimental results on natural domain data to demonstrate the generalizability of our proposed method beyond medical contexts.