Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training

arXiv cs.CV / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study applies a Vision Transformer (ViT) to classify anaplastic large cell lymphoma (ALCL) versus classic Hodgkin lymphoma (cHL) using histology image patches.
It builds on earlier fully supervised results (trained on 1,200 patches) that reached 100% accuracy and an F1 score of 1.0 on an independent test set.
To make the approach more clinically practical, the authors switch to weakly supervised training by using slide-level labels to automatically label patch-level training data.
With a much larger dataset of 100,000 image patches, the weakly supervised ViT achieves evaluation metrics of 91.85% accuracy, F1 = 0.92, and AUC = 0.98.
The authors conclude the weakly supervised ViT is suitable as a deep learning module for clinical model development when automated patch extraction is feasible.

Abstract

Vision transformers (ViT) have been shown to allow for more flexible feature detection and can outperform convolutional neural network (CNN) when pre-trained on sufficient data. Due to their promising feature detection capabilities, we deployed ViTs for morphological classification of anaplastic large cell lymphoma (ALCL) versus classic Hodgkin lymphoma (cHL). We had previously designed a ViT model which was trained on a small dataset of 1,200 image patches in fully supervised training. That model achieved a diagnostic accuracy of 100% and an F1 score of 1.0 on the independent test set. Since fully supervised training is not a practical method due to lack of expertise resources in both the training and testing phases, we conducted a recent study on a modified approach to training data (weakly supervised training) and show that labeling training image patch automatically at the slide level of each whole-slide-image is a more practical solution for clinical use of Vision Transformer. Our ViT model, trained on a larger dataset of 100,000 image patches, yields evaluation metrics with significant accuracy, F1 score, and area under the curve (AUC) at 91.85%, 0.92, and 0.98, respectively. These are respectable values that qualify this ViT model, with weakly supervised training, as a suitable tool for a deep learning module in clinical model development using automated image patch extraction.

Black Hat Asia

AI Business

oh-my-agent is Now Official on Homebrew-core: A New Milestone for Multi-Agent Orchestration

Dev.to

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

Dev.to

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training

Key Points

Abstract

Related Articles

Black Hat Asia

oh-my-agent is Now Official on Homebrew-core: A New Milestone for Multi-Agent Orchestration

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer