Deep neural networks with Fisher vector encoding for medical image classification

arXiv cs.CV / 5/5/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes using Fisher Vector (an orderless encoding method) to enhance hybrid CNN + Vision Transformer (ViT) architectures for medical image classification, especially when data availability varies.
The approach computes Fisher Vectors by estimating a Gaussian Mixture Model (GMM) over image feature representations, aiming to improve feature representation beyond existing CNN+ViT hybrids.
To address scalability limits on large datasets, the authors introduce a strategy that limits the growth in computational cost of GMM estimation as dataset size increases.
Experiments on multiple medical imaging benchmarks (MedMNIST v2, Clean-CC-CCII, and ISIC2018) show improved results, outperforming benchmarks across all MedMNIST v2 datasets and achieving competitive performance on Clean-CC-CCII and ISIC2018.
Overall, the work positions Fisher Vector-enhanced CNN+ViT models as a practical option for both small and large medical imaging datasets while managing computational trade-offs.

Abstract

Orderless encoding methods have shown to improve Convolutional Neural Networks (CNNs) for image classification in the context of limited availability of data. Additionally, hybrid CNN + Vision Transformers (ViT) models have been recently proposed to address CNN locality bias issues. These models outperformed CNN-only approaches. Despite that, the integration of such hybrid models with more elaborated feature representation can be highly beneficial and remains large unexplored in the literature. In this context, we propose the introduction of an orderless encoding method, Fisher Vectors, to hybrid CNN + ViT architectures, aiming at achieving a model suitable for both small and large datasets. Such enconding method relies on estimating a Gaussian Mixture Model (GMM) on image features. In large datasets, computational costs of the GMM estimation is a limiting factor for the application of Fisher Vectors. Thus, we propose a method to limit the growth of GMM estimation costs as we increase the size of the dataset. We explore the feasibility of our method in the context of medical image classification by appling it to MedMNIST (v2), Clean-CC-CCII and ISIC2018. This collection of datasets contains a wide variety of data scales and modalities. We outperform benchmark results in all MedMNIST (v2) datasets and obtain literature-competitive results in Clean-CC-CCII and ISIC2018.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Dev.to

Nano Banana Pro vs DALL-E 3 vs Midjourney: A Practical Comparison From Someone Who Actually Uses All Three

Dev.to

LLMs edited 86 human essays toward a semantic cluster not occupied by any human writer [D]

Reddit r/MachineLearning

Fake News Detection using Machine Learning & NLP!

Dev.to

Deep neural networks with Fisher vector encoding for medical image classification

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Nano Banana Pro vs DALL-E 3 vs Midjourney: A Practical Comparison From Someone Who Actually Uses All Three

LLMs edited 86 human essays toward a semantic cluster not occupied by any human writer [D]

Fake News Detection using Machine Learning & NLP!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer