Depression Risk Assessment in Social Media via Large Language Models

arXiv cs.CL / 4/23/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes an LLM-based system to assess depression risk in Reddit posts by multi-label classifying eight depression-associated emotions and computing a weighted severity index.
It evaluates the approach in a zero-shot setup on the DepressionEmo dataset (~6,000 posts) and then applies it in-the-wild to 469,692 comments from four subreddits collected in 2024–2025.
The best-performing model, gemma3:27b, reaches micro-F1 of 0.75 and macro-F1 of 0.70, performing competitively with purpose-built fine-tuned models such as BART (micro-F1 0.80, macro-F1 0.76).
The in-the-wild results show temporally stable risk profiles across communities and clear differences between r/depression and r/anxiety.
Overall, the authors argue the method provides a feasible and cost-effective way to scale psychological monitoring using social-media language signals.

Abstract

Depression is one of the most prevalent and debilitating mental health conditions worldwide, frequently underdiagnosed and undertreated. The proliferation of social media platforms provides a rich source of naturalistic linguistic signals for the automated monitoring of psychological well-being. In this work, we propose a system based on Large Language Models (LLMs) for depression risk assessment in Reddit posts, through multi-label classification of eight depression-associated emotions and the computation of a weighted severity index. The method is evaluated in a zero-shot setting on the annotated DepressionEmo dataset (~6,000 posts) and applied in-the-wild to 469,692 comments collected from four subreddits over the period 2024-2025. Our best model, gemma3:27b, achieves micro-F1 = 0.75 and macro-F1 = 0.70, results competitive with purpose-built fine-tuned models (BART: micro-F1 = 0.80, macro-F1 = 0.76). The in-the-wild analysis reveals consistent and temporally stable risk profiles across communities, with marked differences between r/depression and r/anxiety. Our findings demonstrate the feasibility of a cost-effective, scalable approach for large-scale psychological monitoring.