CLiGNet: Clinical Label-Interaction Graph Network for Medical Specialty Classification from Clinical Transcriptions

arXiv cs.AI / 3/25/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The authors identify a data leakage issue in prior work using the MTSamples benchmark (due to SMOTE applied before train/test splitting) and release a leakage-free benchmark across 40 medical specialties that shows the task is harder than previously reported.
  • They propose CLiGNet, combining a Bio ClinicalBERT encoder with a two-layer GCN over a specialty label graph built from semantic similarity plus ICD-10 chapter priors.
  • CLiGNet uses per-label attention gating and focal binary cross-entropy loss to address extreme class imbalance (181:1), improving macro F1 over several baselines.
  • In experiments, the GCN label-graph component delivers the largest improvement (about +0.066 macro F1), while Platt-scaling calibration reduces expected calibration error to 0.007 for better probability reliability.
  • The paper includes failure analysis (specialty confusions, rare-class behavior, document length effects) and token-level Integrated Gradients attribution to support clinical NLP deployment decisions.

Abstract

Automated classification of clinical transcriptions into medical specialties is essential for routing, coding, and clinical decision support, yet prior work on the widely used MTSamples benchmark suffers from severe data leakage caused by applying SMOTE oversampling before train test splitting. We first document this methodological flaw and establish a leakage free benchmark across 40 medical specialties (4966 records), revealing that the true task difficulty is substantially higher than previously reported. We then introduce CLiGNet (Clinical Label Interaction Graph Network), a neural architecture that combines a Bio ClinicalBERT text encoder with a two layer Graph Convolutional Network operating on a specialty label graph constructed from semantic similarity and ICD 10 chapter priors. Per label attention gates fuse document and label graph representations, trained with focal binary cross entropy loss to handle extreme class imbalance (181 to 1 ratio). Across seven baselines ranging from TF IDF classifiers to Clinical Longformer, CLiGNet without calibration achieves the highest macro F1 of 0.279, with an ablation study confirming that the GCN label graph provides the single largest component gain (increase of 0.066 macro F1). Adding per label Platt scaling calibration yields an expected calibration error of 0.007, demonstrating a principled trade off between ranking performance and probability reliability. We provide comprehensive failure analysis covering pairwise specialty confusions, rare class behaviour, document length effects, and token level Integrated Gradients attribution, offering actionable insights for clinical NLP system deployment.

CLiGNet: Clinical Label-Interaction Graph Network for Medical Specialty Classification from Clinical Transcriptions | AI Navigate