CHRep: Cross-modal Histology Representation and Post-hoc Calibration for Spatial Gene Expression Prediction

arXiv cs.CV / 4/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CHRep, a two-phase framework to predict spatial gene expression from routine H&E slides, addressing the high cost and low throughput of spatial transcriptomics for large studies and clinical use.
  • During training, CHRep learns structure-aware histology representations using correlation-aware regression, symmetric image–expression alignment, and spatial topology regularization based on coordinates.
  • In inference, it improves robustness across slides without fine-tuning the backbone by using a lightweight post-hoc calibration module that combines a non-parametric estimate from a training gallery with a magnitude-regularized correction.
  • CHRep improves gene-wise prediction under realistic leave-one-slide-out evaluation, showing particularly large gains for the Alex+10x setting and measurable increases in Pearson correlation along with reductions in MSE and MAE versus prior methods.

Abstract

Spatial transcriptomics (ST) enables spatially resolved gene profiling but remains expensive and low-throughput, limiting large-cohort studies and routine clinical use. Predicting spatial gene expression from routine hematoxylin and eosin (H&E) slides is a promising alternative, yet under realistic leave-one-slide-out evaluation, existing models often suffer from slide-level appearance shifts and regression-driven over-smoothing that suppress biologically meaningful variation. CHRep is a two-phase framework for robust histology-to-expression prediction. In the training phase, CHRep learns a structure-aware representation by jointly optimizing correlation-aware regression, symmetric image-expression alignment, and coordinate-induced spatial topology regularization. In the inference phase, cross-slide robustness is improved without backbone fine-tuning through a lightweight calibration module trained on the training slides, which combines a non-parametric estimate from a training gallery with a magnitude-regularized correction module. Unlike prior embedding-alignment or retrieval-based transfer methods that rely on a single prediction route, CHRep couples topology-preserving representation learning with post-hoc calibration, enabling stable neighborhood retrieval and controlled bias correction under slide-level shifts. Across the three cohorts, CHRep consistently improves gene-wise correlation under leave-one-slide-out evaluation, with the largest gains observed on Alex+10x. Relative to HAGE, the Pearson correlation coefficient on all considered genes [PCC(ACG)] increases by 4.0% on cSCC and 9.8% on HER2+. Relative to mclSTExp, PCC(ACG) further improves by 39.5% on Alex+10x, together with 9.7% and 9.0% reductions in mean squared error (MSE) and mean absolute error (MAE), respectively.