Detecting HIV-Related Stigma in Clinical Narratives Using Large Language Models
arXiv cs.CL / 4/10/2026
📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The study presents an LLM-based approach to detect HIV-related stigma in clinical narratives, addressing the lack of ready-to-use tools for extracting stigma information from clinical notes.
- It uses UF Health clinical notes (2012–2022) and builds a labeled dataset of 1,332 annotated sentences across four stigma subscales: concern with public attitudes, disclosure concerns, negative self-image, and personalized stigma.
- Encoder and generative LLMs are benchmarked using zero-shot and few-shot prompting, with GatorTron-large achieving the best overall performance (Micro-F1 = 0.62).
- Few-shot prompting significantly boosts generative models, where 5-shot GPT-OSS-20B (Micro-F1 = 0.57) and LLaMA-8B (Micro-F1 = 0.59) perform competitively, but zero-shot generative inference shows notable failure rates (up to 32%).
- Predictive performance varies by subscale, with negative self-image easiest to detect and personalized stigma remaining the hardest, highlighting areas for future model refinement.



