Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes an attention-guided masking strategy for masked image modeling (MIM) tailored to medical images, aiming to reduce information leakage caused by random masking in locally similar contexts.
Because Swin transformers lack a global [CLS] token, the authors introduce a co-distillation framework that selectively masks semantically co-occurring, discriminative patches to make self-supervised pretraining harder and more effective.
The authors identify a key limitation of attention-guided masking—reduced diversity across attention heads—which can hurt downstream performance.
To overcome this, they introduce a “noisy teacher” mechanism (DAGMaN) within the co-distillation setup to maintain high attention-head diversity while still performing attentive masking.
Experiments across multiple medical imaging tasks (lung nodule classification, immunotherapy outcome prediction, tumor segmentation, and organs clustering) demonstrate DAGMaN’s effectiveness as a self-supervised learning approach.

Abstract

Masked image modeling (MIM) is a highly effective self-supervised learning (SSL) approach to extract useful feature representations from unannotated data. Predominantly used random masking methods make SSL less effective for medical images due to the contextual similarity of neighboring patches, leading to information leakage and SSL simplification. Hierarchical shifted window (Swin) transformer, a highly effective approach for medical images cannot use advanced masking methods as it lacks a global [CLS] token. Hence, we introduced an attention guided masking mechanism for Swin within a co-distillation learning framework to selectively mask semantically co-occurring and discriminative patches, to reduce information leakage and increase the difficulty of SSL pretraining. However, attention guided masking inevitably reduces the diversity of attention heads, which negatively impacts downstream task performance. To address this, we for the first time, integrate a noisy teacher into the co-distillation framework (termed DAGMaN) that performs attentive masking while preserving high attention head diversity. We demonstrate the capability of DAGMaN on multiple tasks including full- and few-shot lung nodule classification, immunotherapy outcome prediction, tumor segmentation, and unsupervised organs clustering.

I Added a Stopwatch to My AI in 1 LOC Using the Livingrimoire While Corporations Need a Year

Dev.to

Built tasuki — an AI CLI Orchestrator that Seamlessly Hands Off Between Tools

Dev.to

I built a GNOME extension for Codex with local/remote history, live filters, Markdown export, and a read-only MCP server

Reddit r/artificial

I Built an Open‑ Source OS for AI Agents – And It’s Ready for You

Dev.to

Kiwi-chan's Log: The Great Log Acquisition Struggle

Dev.to

Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images

Key Points

Abstract

Related Articles

I Added a Stopwatch to My AI in 1 LOC Using the Livingrimoire While Corporations Need a Year

Built tasuki — an AI CLI Orchestrator that Seamlessly Hands Off Between Tools

I built a GNOME extension for Codex with local/remote history, live filters, Markdown export, and a read-only MCP server

I Built an Open‑ Source OS for AI Agents – And It’s Ready for You

Kiwi-chan's Log: The Great Log Acquisition Struggle

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer