Graph Transformer-Based Pathway Embedding for Cancer Prognosis

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the challenge of predicting cancer progression from heterogeneous multi-omics data, focusing on how models encode genes for pathway representations.
  • It introduces PATH, a modulation-based, patient-conditioned gene embedding method that begins with shared gene base embeddings and then adapts them using patient-specific CNV and mutation signals.
  • PATH is implemented within a graph transformer that uses pathway-guided attention to model interactions among biologically connected pathways.
  • In pancancer metastasis prediction, PATH reaches an F1 score of 0.8766, an 8.8% improvement over current SOTA multi-omics benchmarks, while also producing biologically meaningful pathway findings and disease-state-specific “pathway rewiring.”

Abstract

Accurate prediction of cancer progression remains a challenge due to the high heterogeneity of molecular omics data across patients. While biologically informed models have improved the interpretability of these predictions, a persistent limitation lies in how they encode individual genes to construct pathway representations. Existing hierarchical models typically derive gene features by directly mapping raw molecular inputs, whereas integration frameworks often rely on simple statistical aggregations of patient-level signals. These approaches often fail to explicitly learn a shared base representation for each gene, thereby limiting the expressiveness and biological accuracy of downstream pathway embeddings. To address this, we introduce PATH, a modulation-based, patient-conditioned gene embedding strategy. PATH represents a paradigm shift by starting from a shared base embedding for each gene, preserving a stable biological identity across the population, and then dynamically adapting it using patient-specific copy number variation (CNV) and mutation signals. This allows the model to capture subtle individual molecular variations while maintaining a consistent latent understanding of the gene itself. We integrate PATH into a graph transformer framework that models interactions among biologically connected pathways through pathway-guided attention. Across pancancer metastasis prediction, PATH achieves an F1 score of 0.8766, representing an 8.8 percent improvement over the current SOTA multi-omics benchmarks. Beyond superior predictive accuracy, our approach identifies biologically meaningful pathways and, crucially, reveals disease-state-specific pathway rewiring, offering new insights into the evolving pathway-pathway interactions that drive cancer progression.