Learning Evidence of Depression Symptoms via Prompt Induction

arXiv cs.CL / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles identifying evidence for 21 depression symptoms in large-scale user-generated text (like social media) to help address limited clinical capacity.
  • It introduces BDI-Sen, a sentence-level dataset annotated for symptom relevance based on BDI-II, highlighting the fine-grained and highly imbalanced nature of the task.
  • The authors report that standard LLM methods (zero-shot, in-context learning, and fine-tuning) have difficulty maintaining consistent relevance criteria across most symptoms.
  • They propose Symptom Induction (SI), which converts labeled examples into short, interpretable symptom-specific guidelines and uses these to condition LLM classification.
  • Across multiple LLM families and models, SI improves overall weighted F1 on BDI-Sen and also shows cross-domain generalization to other disorders with overlapping symptoms (bipolar and eating disorders).

Abstract

Depression places substantial pressure on mental health services, and many people describe their experiences outside clinical settings in high-volume user-generated text (e.g., online forums and social media). Automatically identifying clinical symptom evidence in such text can therefore complement limited clinical capacity and scale to large populations. We address this need through sentence-level classification of 21 depression symptoms from the BDI-II questionnaire, using BDI-Sen, a dataset annotated for symptom relevance. This task is fine-grained and highly imbalanced, and we find that common LLM approaches (zero-shot, in-context learning, and fine-tuning) struggle to apply consistent relevance criteria for most symptoms. We propose Symptom Induction (SI), a novel approach which compresses labeled examples into short, interpretable guidelines that specify what counts as evidence for each symptom and uses these guidelines to condition classification. Across four LLM families and eight models, SI achieves the best overall weighted F1 on BDI-Sen, with especially large gains for infrequent symptoms. Cross-domain evaluation on an external dataset further shows that induced guidelines generalize across other diseases shared symptomatology (bipolar and eating disorders).