G-Loss: Graph-Guided Fine-Tuning of Language Models

arXiv cs.CL / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Traditional loss functions used to fine-tune language models tend to focus on local neighborhoods in embedding space and miss the global semantic structure.
  • The paper introduces G-Loss, a graph-guided loss that uses semi-supervised label propagation and a document-similarity graph to capture global relationships on the embedding manifold.
  • By leveraging these structural signals, G-Loss aims to produce more discriminative and robust embeddings for downstream tasks.
  • Experiments on five benchmark datasets for classification (MR, R8/R52, Ohsumed, and 20NG) show faster convergence in most setups and higher classification accuracy than fine-tuning with conventional losses.
  • Overall, the results suggest that incorporating global semantic structure via graph-based objectives can improve language model fine-tuning quality.

Abstract

Traditional loss functions, including cross-entropy, contrastive, triplet, and su pervised contrastive losses, used for fine-tuning pre-trained language models such as BERT, operate only within local neighborhoods and fail to account for the global semantic structure. We present G-Loss, a graph-guided loss function that incorporates semi-supervised label propagation to use structural relationships within the embedding manifold. G-Loss builds a document-similarity graph that captures global semantic relationships, thereby guiding the model to learn more discriminative and robust embeddings. We evaluate G-Loss on five benchmark datasets covering key downstream classification tasks: MR (sentiment analysis), R8 and R52 (topic categorization), Ohsumed (medical document classification), and 20NG (news categorization). In the majority of experimental setups, G-Loss converges faster and produces semantically coherent embedding spaces, resulting in higher classification accuracy than models fine-tuned with traditional loss functions.