Unified Multi-Foundation-Model Slide Representation for Pan-Cancer Recognition and Text-Guided Tumor Localization

arXiv cs.CV / 4/28/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ASTRA, a pan-cancer framework that unifies fragmented tile-level representations from multiple pathology foundation models into a shared slide-level representation space for clinical-grade slide reasoning.
  • ASTRA semantically anchors this shared space using structured pathology metadata fields (classification category, cancer type, and anatomic site), enabling interpretability and text-guided localization.
  • The method uses sparse mixture-of-experts contextualization, masked multi-model reconstruction, and contrastive alignment to structured pathology prompts to learn slide representations supporting multi-level classification and weakly supervised tumor localization.
  • Trained on 10,359 whole-slide images across 16 tumor types from the CHTN cohort, ASTRA improves pan-cancer classification across four foundation-model backbones, reaching up to 97.8% macro-AUC (4-category), 99.7% (3-class typing), and 99.2% (16-class typing).
  • For localization, ASTRA achieves a mean Dice score of 0.897 on an in-domain annotated subset (n=380) and 0.738 on an external TCGA subset (n=1,686), showing strong generalization without pixel-level supervision.

Abstract

The expanding ecosystem of pathology foundation models has produced powerful but fragmented tile-level representations, limiting their use in clinical tasks that require unified slide-level reasoning and interpretable linkage to clinically meaningful information. We present ASTRA, a pan-cancer framework that integrates heterogeneous foundation-model representations into a shared slide-level representation space and semantically grounds that space using structured pathology annotation fields, including classification category, cancer type, and anatomic site. ASTRA combines sparse mixture-of-experts contextualization, masked multi-model reconstruction, and contrastive alignment to structured pathology prompts to learn slide representations that support 4-category classification, 3-class solid tumor typing, 16-class cancer typing, and text-guided tumor localization without pixel-level supervision. Developed on a CHTN cohort of 10,359 whole-slide images (WSIs) spanning 16 tumor types, ASTRA consistently improves pan-cancer classification across four pathology foundation-model backbones, achieving up to 97.8% macro-AUC for 4-category classification, 99.7% for 3-class solid tumor typing, and 99.2% for 16-class cancer typing. For tumor localization, ASTRA achieves a mean Dice of 0.897 on an annotated in-domain CHTN subset (n = 380) spanning 16 cancer types and 0.738 on an external TCGA cohort (n = 1,686) spanning four cancer types. These results demonstrate that minimal structured pathology annotation fields derived from slide-level metadata can provide effective semantic supervision for unified slide representation learning, enabling both pan-cancer prediction and weakly supervised tumor localization within a single framework.