Dino-NestedUNet: Unlocking Foundation Vision Encoders for Pathology Tumor Bulk Segmentation via Dense Decoding

arXiv cs.CV / 5/5/2026

📰 NewsModels & Research

共有:

Key Points

The paper introduces Dino-NestedUNet, which pairs a pre-trained DINOv3 vision foundation encoder with a “Nested Dense Decoder” designed for more accurate boundary reconstruction in pathology tumor bulk segmentation.
It argues that prior approaches that freeze VFMs and attach lightweight decoders suffer from capacity mismatch, leading to poorer boundary fidelity for infiltrative tumors.
Dino-NestedUNet replaces sparse skip connections and simple upsampling with a dense grid of intermediate pathways to support continuous feature reuse and multi-scale recalibration during decoding.
Experiments on three histopathology cohorts (CHTN, OSU, CAMELYON16) show consistent gains over UNet++ and standard Dino-UNet variants, with especially strong benefits under cross-domain shift.
The model also demonstrates promising external generalization via zero-shot testing (train on CHTN, test on TIGER WSIBULK and OSU CRC) without fine-tuning, highlighting the value of dense decoding for foundation-encoder segmentation tasks.

Abstract

Vision foundation models (VFMs), such as DINOv3, provide rich semantic representations that are promising for computational pathology. However, many current adaptations pair frozen VFMs with lightweight decoders, creating a capacity mismatch that often limits boundary fidelity for infiltrative tumor bulk segmentation. This paper presents Dino-NestedUNet, a framework that couples a pre-trained DINOv3 encoder with a Nested Dense Decoder. Instead of sparse skip connections and linear upsampling, the proposed decoder forms a dense grid of intermediate pathways to enable continuous feature reuse and multi-scale recalibration, aligning high-level semantics with low-level morphological textures during reconstruction. We evaluate Dino-NestedUNet on three histopathology cohorts (multi-center CHTN, institutional OSU, and CAMELYON16) and observe consistent improvements over UNet++ and standard Dino-UNet variants, particularly under cross-domain shift. To further assess external generalization, we perform zero-shot evaluation by training on CHTN and directly testing on unseen TIGER WSIBULK and OSU CRC cohorts without fine-tuning. These results suggest that dense decoding is a key ingredient for unlocking foundation encoders in boundary-sensitive pathology segmentation.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Dev.to

Nano Banana Pro vs DALL-E 3 vs Midjourney: A Practical Comparison From Someone Who Actually Uses All Three

Dev.to

LLMs edited 86 human essays toward a semantic cluster not occupied by any human writer [D]

Reddit r/MachineLearning

Fake News Detection using Machine Learning & NLP!

Dev.to

Dino-NestedUNet: Unlocking Foundation Vision Encoders for Pathology Tumor Bulk Segmentation via Dense Decoding

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Nano Banana Pro vs DALL-E 3 vs Midjourney: A Practical Comparison From Someone Who Actually Uses All Three

LLMs edited 86 human essays toward a semantic cluster not occupied by any human writer [D]

Fake News Detection using Machine Learning & NLP!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer