When PCOS Meets Eating Disorders: An Explainable AI Approach to Detecting the Hidden Triple Burden

arXiv cs.CL / 4/17/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The study addresses a “triple burden” for women with PCOS—body image distress, disordered eating, and metabolic challenges—by improving how social media language can be analyzed for these co-occurring conditions.
  • It introduces small open-source language models fine-tuned to produce grounded, explainable outputs with textual evidence, improving transparency over prior NLP methods.
  • Using 1,000 PCOS-related posts from six subreddits and two annotators, the researchers operationalized a clinical framework to label posts and evaluate model performance.
  • The best-performing model (among Gemma-2-2B, Qwen3-1.7B, and DeepSeek-R1-Distill-Qwen-1.5B) reached 75.3% exact match accuracy on held-out posts, with better comorbidity detection and strong explainability.
  • Results show accuracy drops as diagnostic complexity increases, suggesting the approach is best suited for screening rather than autonomous diagnosis.

Abstract

Women with polycystic ovary syndrome (PCOS) face substantially elevated risks of body image distress, disordered eating, and metabolic challenges, yet existing natural language processing approaches for detecting these conditions lack transparency and cannot identify co-occurring presentations. We developed small, open-source language models to automatically detect this triple burden in social media posts with grounded explainability. We collected 1,000 PCOS-related posts from six subreddits, with two trained annotators labeling posts using guidelines operationalizing Lee et al. (2017) clinical framework. Three models (Gemma-2-2B, Qwen3-1.7B, DeepSeek-R1-Distill-Qwen-1.5B) were fine-tuned using Low-Rank Adaptation to generate structured explanations with textual evidence. The best model achieved 75.3 percent exact match accuracy on 150 held-out posts, with robust comorbidity detection and strong explainability. Performance declined with diagnostic complexity, indicating their best use is for screening rather than autonomous diagnosis.