Federated Semi-Supervised Graph Neural Networks with Prototype-Guided Pseudo-Labeling for Privacy-Preserving Gestational Diabetes Mellitus Prediction

arXiv cs.LG / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces FedTGNN-SS, a privacy-preserving federated semi-supervised graph neural network framework for early Gestational Diabetes Mellitus (GDM) risk prediction from EHR data.
  • It addresses real-world constraints by combining label-scarcity handling (prototype-guided pseudo-labeling with neighborhood agreement) and privacy constraints (federation across hospitals using class-level centroid sharing only).
  • Each hospital constructs and periodically refines a local k-NN patient similarity graph using learned embeddings, and applies clinical-aware consistency augmentation specifically to continuous variables.
  • Experiments on three diabetes-related datasets show FedTGNN-SS delivers 56 statistically significant wins over 11 federated baselines and maintains strong AUROC even with extreme missing-label rates (e.g., 0.8037 and 0.9634 at 80% missing labels).
  • The results suggest the approach is effective for clinical tabular EHR settings where both confirmed diagnostic labels are limited and cross-institution data sharing is restricted.

Abstract

Gestational Diabetes Mellitus (GDM) is a high-prevalence pregnancy complication that requires accurate early risk stratification to reduce maternal and fetal morbidity. However, real-world clinical deployment of machine learning is hindered by two coupled constraints: (i) label scarcity, where a large fraction of electronic health records (EHR) lack confirmed diagnostic labels, and (ii) data privacy, which prevents sharing patient-level data across hospitals. This paper proposes FedTGNN-SS, a privacy-preserving federated semi-supervised framework for clinical tabular EHR. Each hospital builds a local k-nearest-neighbor patient similarity graph and trains a topology-adaptive GNN encoder. To robustly exploit unlabeled records, FedTGNN-SS combines (1) prototype-guided pseudo-labeling with neighborhood agreement, (2) adaptive graph refinement that periodically updates the k-NN graph using learned embeddings, (3) clinical-aware consistency augmentation applied only to continuous variables, and (4) privacy-safe prototype sharing that exchanges only class-level centroids. Across three diabetes-related datasets (GDM: N = 3,525; Pima: N = 768; Early Stage: N = 520) under 10\%-80\% missing labels per silo, FedTGNN-SS achieves 56 significant wins (p < 0.05) against 11 federated baselines and attains strong AUROC under extreme scarcity (Pima: 0.8037 at 80\% missing, Early Stage: 0.9634 at 80\% missing).

Federated Semi-Supervised Graph Neural Networks with Prototype-Guided Pseudo-Labeling for Privacy-Preserving Gestational Diabetes Mellitus Prediction | AI Navigate