Cross-Modal Knowledge Distillation from Spatial Transcriptomics to Histology

arXiv cs.CV / 4/13/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a cross-modal knowledge distillation framework that transfers tissue niche structure learned from spatial transcriptomics to a histology-only model using paired training data (spatial transcriptomics + H&E).
  • It aims to overcome a key mismatch in data availability by exploiting abundant H&E slides to recreate more granular, transcriptomics-informed representations at inference time.
  • Experiments across multiple tissue types and disease contexts show the distilled histology model agrees substantially better with transcriptomics-derived niche structure than morphology-only unsupervised baselines.
  • The approach also recovers biologically meaningful neighborhood composition, supported by downstream cell-type analysis.
  • After training with paired modalities, the method can be applied to new tissue regions using histology alone, with no transcriptomic input during inference.

Abstract

Spatial transcriptomics provides a molecularly rich description of tissue organization, enabling unsupervised discovery of tissue niches -- spatially coherent regions of distinct cell-type composition and function that are relevant to both biological research and clinical interpretation. However, spatial transcriptomics remains costly and scarce, while H&E histology is abundant but carries a less granular signal. We propose to leverage paired spatial transcriptomics and H&E data to transfer transcriptomics-derived niche structure to a histology-only model via cross-modal distillation. Across multiple tissue types and disease contexts, the distilled model achieves substantially higher agreement with transcriptomics-derived niche structure than unsupervised morphology-based baselines trained on identical image features, and recovers biologically meaningful neighborhood composition as confirmed by cell-type analysis. The resulting framework leverages paired spatial transcriptomic and H&E data during training, and can then be applied to held-out tissue regions using histology alone, without any transcriptomic input at inference time.