Mini-Batch Class Composition Bias in Link Prediction

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that representations learned by GNNs for node classification do not necessarily transfer to link prediction for a fixed graph, contrary to prior intuition.
  • It finds that widely used link prediction models can exploit a trivial mini-batch–dependent heuristic enabled by batch-normalization layers to perform edge classification.
  • After correcting for this shortcut behavior, the authors observe stronger alignment between the learned network representations and features relevant to node classification.
  • The results imply that conventional link-prediction training may overstate how well link predictors learn task-agnostic, graph-generalized representations.
  • The study is presented as an arXiv announcement (version v1), contributing new evidence to the understanding of what GNN link prediction models are actually learning.

Abstract

Prior work on node classification has shown that Graph Neural Networks (GNNs) can learn representations that transfer across graphs, when underlying graph properties are shared. For a fixed graph, one would then expect GNNs trained for link prediction to learn a representation consistent with that learnt for node classification. We show this intuition does not hold in the general case. Instead, we find popular link prediction models can learn a trivial mini-batch dependent heuristic, enabled by batch-normalisation layers, to solve the edge classification task. When correcting for this, we observe increased alignment of the network representation with node-class relevant features, suggesting the network has learnt a graph representation that better aligns with the underlying graph's properties. Our findings suggest that standard link prediction training may be leading us to overestimate link predictors' ability to learn a generalised representation of a graph that is consistent across tasks.