Sparse Task Vector Mixup with Hypernetworks for Efficient Knowledge Transfer in Whole-Slide Image Prognosis

arXiv cs.CV / 3/12/2026

📰 NewsModels & Research

共有:

Key Points

The STEPH method introduces sparse task vector mixup with hypernetworks to transfer prognostic knowledge across cancer types for whole-slide image prognosis.
It applies task vector mixups to each source-target cancer pair and sparsely aggregates the mixtures to build an improved target model, guided by hypernetworks.
The approach reduces dependence on large-scale joint training or extensive multi-model inference, offering a more computationally efficient knowledge transfer solution.
Experiments on 13 cancer datasets show STEPH outperforms cancer-specific learning by 5.14% and a prior knowledge-transfer baseline by 2.01%.
The authors provide publicly available code at GitHub.

Abstract

Whole-Slide Images (WSIs) are widely used for estimating the prognosis of cancer patients. Current studies generally follow a cancer-specific learning paradigm. However, the available training samples for one cancer type are usually scarce in pathology. Consequently, the model often struggles to learn generalizable knowledge, thus performing worse on the tumor samples with inherent high heterogeneity. Although multi-cancer joint learning and knowledge transfer approaches have been explored recently to address it, they either rely on large-scale joint training or extensive inference across multiple models, posing new challenges in computational efficiency. To this end, this paper proposes a new scheme, Sparse Task Vector Mixup with Hypernetworks (STEPH). Unlike previous ones, it efficiently absorbs generalizable knowledge from other cancers for the target via model merging: i) applying task vector mixup to each source-target pair and then ii) sparsely aggregating task vector mixtures to obtain an improved target model, driven by hypernetworks. Extensive experiments on 13 cancer datasets show that STEPH improves over cancer-specific learning and an existing knowledge transfer baseline by 5.14% and 2.01%, respectively. Moreover, it is a more efficient solution for learning prognostic knowledge from other cancers, without requiring large-scale joint training or extensive multi-model inference. Code is publicly available at https://github.com/liupei101/STEPH.

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Dev.to

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

Reddit r/MachineLearning

Mistral Small 4 vs Qwen3.5-9B on document understanding benchmarks, but it does better than GPT-4.1

Reddit r/LocalLLaMA

Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it

Reddit r/LocalLLaMA

Ooh, new drama just dropped 👀

Reddit r/LocalLLaMA

Sparse Task Vector Mixup with Hypernetworks for Efficient Knowledge Transfer in Whole-Slide Image Prognosis

Key Points

Abstract

Related Articles

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

Mistral Small 4 vs Qwen3.5-9B on document understanding benchmarks, but it does better than GPT-4.1

Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it

Ooh, new drama just dropped 👀

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer