HopRank: Self-Supervised LLM Preference-Tuning on Graphs for Few-Shot Node Classification

arXiv cs.CL / 4/21/2026

📰 News

Key Points

  • HopRank proposes a self-supervised way to do node classification on text-attributed graphs by leveraging graph topology and the homophily principle (connected nodes tend to share classes).
  • The method reformulates node classification as a link-prediction-style problem, creating preference data through hierarchical hop-based sampling and then tuning an LLM with adaptive preference learning using zero class labels.
  • During inference, HopRank classifies nodes by predicting their connection preferences to labeled anchor nodes, rather than relying on the LLM to directly map node text to labels.
  • The authors report that experiments on three TAG benchmarks show performance that matches fully supervised GNNs and significantly exceeds prior graph-LLM approaches, while using no labeled data for training.
  • The framework also includes an adaptive early-exit voting mechanism to reduce inference cost by stopping once confident voting is reached.
  • categories: [

Abstract

Node classification on text-attributed graphs (TAGs) is a fundamental task with broad applications in citation analysis, social networks, and recommendation systems. Current GNN-based approaches suffer from shallow text encoding and heavy dependence on labeled data, limiting their effectiveness in label-scarce settings. While large language models (LLMs) naturally address the text understanding gap with deep semantic reasoning, existing LLM-for-graph methods either still require abundant labels during training or fail to exploit the rich structural signals freely available in graph topology. Our key observation is that, in many real-world TAGs, edges predominantly connect similar nodes under the homophily principle, meaning graph topology inherently encodes class structure without any labels. Building on this insight, we reformulate node classification as a link prediction task and present HopRank, a fully self-supervised LLM-tuning framework for TAGs. HopRank constructs preference data via hierarchical hop-based sampling and employs adaptive preference learning to prioritize informative training signals without any class labels. At inference, nodes are classified by predicting their connection preferences to labeled anchors, with an adaptive early-exit voting scheme to improve efficiency. Experiments on three TAG benchmarks show that HopRank matches fully-supervised GNNs and substantially outperforms prior graph-LLM methods, despite using zero labeled training data.