K-SENSE: A Knowledge-Guided Self-Augmented Encoder for Neuro-Semantic Evaluation of Mental Health Conditions on Social Media

arXiv cs.CL / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The article proposes K-SENSE, a unified framework for neuro-semantic evaluation of mental health conditions from social media text, aimed at improving stress and depression detection despite figurative and implicit language.
  • K-SENSE combines external commonsense/psychological reasoning (via COMET across five mental-state dimensions) with robustness techniques (a three-stage encoding pipeline and supervised contrastive learning).
  • It uses a dual-stream encoder to build a “semantic anchor” by projecting and fusing hidden representations into a shared space, while the training objective aligns same-class examples and suppresses irrelevant knowledge noise.
  • Experiments on Dreaddit (stress) and Depression_Mixed (depression) show mean F1 scores of 86.1 and 94.3, improving roughly 2.6 and 1.5 percentage points over the best prior baselines.
  • Ablation studies indicate that individual components—such as the temporal knowledge integration strategy and freezing the knowledge encoder during fine-tuning—each contribute to the overall performance gains.

Abstract

Early detection of mental health conditions, particularly stress and depression, from social media text remains a challenging open problem in computational psychiatry and natural language processing. Automated systems must contend with figurative language, implicit emotional expression, and the high noise inherent in user-generated content. Existing approaches either leverage external commonsense knowledge to model mental states explicitly, or apply self-augmentation and contrastive training to improve generalization, but seldom do both in a principled, unified framework. We propose K-SENSE (Knowledge-guided Self-augmented Encoder for Neuro-Semantic Evaluation of Mental Health), a framework that jointly exploits external psychological reasoning and internal representation robustness. K-SENSE adopts a three-stage encoding pipeline: (1) inferential commonsense knowledge is extracted from the COMET model across five mental state dimensions; (2) a semantic anchor is constructed by combining hidden representations from two parallel encoding streams, projected into a shared space before fusion; and (3) a supervised contrastive learning objective aligns same-class representations while encouraging the attention mechanism to suppress irrelevant knowledge noise. We evaluate K-SENSE on Dreaddit (stress detection) and Depression_Mixed (depression detection), achieving mean F1-scores of 86.1 (0.6%) and 94.3 (0.8%), respectively, over five independent runs. These represent improvements of approximately 2.6 and 1.5 percentage points over the strongest prior baselines. Ablation experiments confirm the contribution of each architectural component, including the temporal knowledge integration strategy and the choice to keep the knowledge encoder frozen during fine-tuning.