RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers

arXiv cs.CV / 3/20/2026

📰 NewsModels & Research

共有:

Key Points

The study applies a Vision Transformer (ViT) based network, fine-tuned for multi-label classification on capsule endoscopic videos, using batch size 16 and 224x224 input patches.
It defines 17 labels, covering anatomical regions (mouth, esophagus, stomach, small intestine, colon, z-line, pylorus, ileocecal valve) and findings (active bleeding, angiectasia, blood, erosion, erythema, hematin, lymphangioectasis, polyp, ulcer), and tests on three videos from Gastro Competition.
On the test set of three videos, the reported mean average precision is 0.0205 at IoU 0.5 and 0.0196 at IoU 0.95, indicating very limited performance for this task so far.
The work demonstrates the feasibility of applying transformers to capsule endoscopic video analysis but underscores the need for better datasets and architectures to improve rare-disease detection in medical imaging.

Abstract

This work is corresponding to the Gastro Competition for multi-label classification from capsule endoscopic videos (CEV). Deep learning network based on Transformers are fined-tune for this task. The based online mode is Google Vision Transformer (ViT) batch16 with 224 x 224 resolutions. In total, 17 labels are classified, which are mouth, esophagus, stomach, small intestine, colon, z-line, pylorus, ileocecal valve, active bleeding, angiectasia, blood, erosion, erythema, hematin, lymphangioectasis, polyp, and ulcer. For test dataset of three videos, the overall mAP @0.5 is 0.0205 whereas the overall mAP @0.95 is 0.0196.

When AI Grows Up: Identity, Memory, and What Persists Across Versions

Dev.to

OpenAI is throwing everything into building a fully automated researcher

MIT Technology Review

Kimi just published a paper replacing residual connections in transformers. results look legit

Reddit r/LocalLLaMA

機械学習の最適化対象まとめ（E資格対策にも）

Qiita

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

Dev.to

RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers

Key Points

Abstract

Related Articles

When AI Grows Up: Identity, Memory, and What Persists Across Versions

OpenAI is throwing everything into building a fully automated researcher

Kimi just published a paper replacing residual connections in transformers. results look legit

機械学習の最適化対象まとめ（E資格対策にも）

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer