Rethinking Multi-Label Node Classification: Do Tuned Classic GNNs Suffice?

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper questions whether recent gains in multi-label node classification (MLNC) come from specialized label-aware architectures or from inadequately tuned baselines.
  • It re-evaluates MLNC using a strong-baseline approach with carefully optimized full-graph GNN backbones such as GCN, SSGConv, and GCNII.
  • The authors apply standard but impactful training design choices (normalization, dropout, and residual connections) to these classic models.
  • Experiments on five benchmark datasets show the tuned baselines outperform specialized methods on four datasets and reach state-of-the-art results in multiple settings.
  • The findings suggest that careful tuning of classic GNNs is a major, sometimes overlooked factor, and they call for more rigorous strong-baseline evaluations in future MLNC research.

Abstract

Multi-label node classification (MLNC) has recently been addressed by increasingly complex label-aware designs that explicitly model node-label interactions and inter-label dependencies.However, it remains unclear whether the advantages of these methods truly stem from their specialized designs, or simply from insufficiently optimized baselines. In this paper, we revisit MLNC from a strong-baseline perspective and investigate whether carefully tuned classic full-graph GNNs can already serve as strong solutions to this task. We systematically study several representative backbones, including GCN, SSGConv, and GCNII, and optimize them using standard yet effective techniques such as normalization, dropout, and residual connections. Experiments on five representative benchmark datasets show that our tuned baselines outperform representative specialized methods on four datasets and achieve state-of-the-art performance in multiple settings. These results indicate that careful tuning of classic backbones is a highly influential but often overlooked factor in MLNC, and highlight the need for more rigorous strong-baseline evaluation in future research on multi-label graph learning.