CLiGNet: Clinical Label-Interaction Graph Network for Medical Specialty Classification from Clinical Transcriptions
arXiv cs.AI / 3/25/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The authors identify a data leakage issue in prior work using the MTSamples benchmark (due to SMOTE applied before train/test splitting) and release a leakage-free benchmark across 40 medical specialties that shows the task is harder than previously reported.
- They propose CLiGNet, combining a Bio ClinicalBERT encoder with a two-layer GCN over a specialty label graph built from semantic similarity plus ICD-10 chapter priors.
- CLiGNet uses per-label attention gating and focal binary cross-entropy loss to address extreme class imbalance (181:1), improving macro F1 over several baselines.
- In experiments, the GCN label-graph component delivers the largest improvement (about +0.066 macro F1), while Platt-scaling calibration reduces expected calibration error to 0.007 for better probability reliability.
- The paper includes failure analysis (specialty confusions, rare-class behavior, document length effects) and token-level Integrated Gradients attribution to support clinical NLP deployment decisions.
Related Articles
5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)
Dev.to
AgentDesk vs Hiring Another Consultant: A Cost Comparison
Dev.to
"Why Your AI Agent Needs a System 1"
Dev.to
When should we expect TurboQuant?
Reddit r/LocalLLaMA
AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia
Dev.to