Graph Topology Information Enhanced Heterogeneous Graph Representation Learning

arXiv cs.LG / 4/8/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that real-world heterogeneous graphs are noisy and often poorly aligned with downstream task needs, which degrades heterogeneous graph representation learning (GRL) performance.
  • It identifies two gaps in existing graph structure learning (GSL): most methods target homogeneous graphs, and applying homogeneous GRL models directly to heterogeneous graphs can cause memory issues.
  • The proposed ToGRL framework uses a two-stage approach where a GSL module extracts task-relevant latent topology from the raw graph, converts it into topology embeddings, and constructs a new graph with smoother signals.
  • By separating adjacency matrix optimization from node representation learning, ToGRL aims to reduce memory consumption while improving downstream task effectiveness.
  • The method further uses prompt tuning to improve adaptability to downstream tasks, and experiments on five real-world datasets report large gains over state-of-the-art baselines.

Abstract

Real-world heterogeneous graphs are inherently noisy and usually not in the optimal graph structures for downstream tasks, which often adversely affects the performance of GRL models in downstream tasks. Although Graph Structure Learning (GSL) methods have been proposed to learn graph structures and downstream tasks simultaneously, existing methods are predominantly designed for homogeneous graphs, while GSL for heterogeneous graphs remains largely unexplored. Two challenges arise in this context. Firstly, the quality of the input graph structure has a more profound impact on GNN-based heterogeneous GRL models compared to their homogeneous counterparts. Secondly, most existing homogenous GRL models encounter memory consumption issues when applied directly to heterogeneous graphs. In this paper, we propose a novel Graph Topology learning Enhanced Heterogeneous Graph Representation Learning framework (ToGRL).ToGRL learns high-quality graph structures and representations for downstream tasks by incorporating task-relevant latent topology information. Specifically, a novel GSL module is first proposed to extract downstream task-related topology information from a raw graph structure and project it into topology embeddings. These embeddings are utilized to construct a new graph with smooth graph signals. This two-stage approach to GSL separates the optimization of the adjacency matrix from node representation learning to reduce memory consumption. Following this, a representation learning module takes the new graph as input to learn embeddings for downstream tasks. ToGRL also leverages prompt tuning to better utilize the knowledge embedded in learned representations, thus enhancing adaptability to downstream tasks. Extensive experiments on five real-world datasets show that our ToGRL outperforms state-of-the-art methods by a large margin.