AI Navigate

Advancing Cancer Prognosis with Hierarchical Fusion of Genomic, Proteomic and Pathology Imaging Data from a Systems Biology Perspective

arXiv cs.CV / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes HFGPI, a hierarchical fusion framework that integrates genomic, proteomic, and histology imaging data to improve cancer prognosis by modeling the biological progression from genes to proteins to images.
  • It introduces Molecular Tokenizer, a strategy combining identity embeddings with expression profiles to create biologically informed representations for genes and proteins.
  • It presents Gene-Regulated Protein Fusion (GRPF), using graph-aware cross-attention with structure-preserving alignment to capture gene–protein regulatory relationships.
  • It develops Protein-Guided Hypergraph Learning (PGHL) with hypergraph convolution to link proteins to image patches and capture higher-order protein–morphology relationships, with hierarchical fusion across layers.
  • Experimental results on five benchmark datasets show HFGPI outperforms state-of-the-art methods in survival prediction.

Abstract

To enhance the precision of cancer prognosis, recent research has increasingly focused on multimodal survival methods by integrating genomic data and histology images. However, current approaches overlook the fact that the proteome serves as an intermediate layer bridging genomic alterations and histopathological features while providing complementary biological information essential for survival prediction. This biological reality exposes another architectural limitation: existing integrative analysis studies fuse these heterogeneous data sources in a flat manner that fails to capture their inherent biological hierarchy. To address these limitations, we propose HFGPI, a hierarchical fusion framework that models the biological progression from genes to proteins to histology images from a systems biology perspective. Specifically, we introduce Molecular Tokenizer, a molecular encoding strategy that integrates identity embeddings with expression profiles to construct biologically informed representations for genes and proteins. We then develop Gene-Regulated Protein Fusion (GRPF), which employs graph-aware cross-attention with structure-preserving alignment to explicitly model gene-protein regulatory relationships and generate gene-regulated protein representations. Additionally, we propose Protein-Guided Hypergraph Learning (PGHL), which establishes associations between proteins and image patches, leveraging hypergraph convolution to capture higher-order protein-morphology relationships. The final features are progressively fused across hierarchical layers to achieve precise survival outcome prediction. Extensive experiments on five benchmark datasets demonstrate the superiority of HFGPI over state-of-the-art methods.