A Hybrid Architecture for Benign-Malignant Classification of Mammography ROIs

arXiv cs.CV / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the need for accurate benign-versus-malignant classification of mammography lesions using ROI-level binary classification on CBIS-DDSM.
  • It proposes a hybrid model that uses EfficientNetV2-M to extract local visual patterns while employing Vision Mamba (a state space model) to capture global context more efficiently than quadratic-cost Vision Transformers.
  • The motivation is that CNNs struggle with long-range dependencies and ViTs can be computationally prohibitive, so the hybrid design targets both accuracy and efficiency.
  • The authors report strong lesion-level performance in an ROI-based setting, positioning the linear-complexity sequence modeling approach as a practical alternative for medical imaging classification.

Abstract

Accurate characterization of suspicious breast lesions in mammography is important for early diagnosis and treatment planning. While Convolutional Neural Networks (CNNs) are effective at extracting local visual patterns, they are less suited to modeling long-range dependencies. Vision Transformers (ViTs) address this limitation through self-attention, but their quadratic computational cost can be prohibitive. This paper presents a hybrid architecture that combines EfficientNetV2-M for local feature extraction with Vision Mamba, a State Space Model (SSM), for efficient global context modeling. The proposed model performs binary classification of abnormality-centered mammography regions of interest (ROIs) from the CBIS-DDSM dataset into benign and malignant classes. By combining a strong CNN backbone with a linear-complexity sequence model, the approach achieves strong lesion-level classification performance in an ROI-based setting.