SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

arXiv cs.CL / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • SleepVLM is introduced as a rule-grounded vision-language model for automated sleep staging from multi-channel polysomnography (PSG) waveform images that also produces clinician-readable rationales tied to AASM scoring criteria.
  • The approach combines waveform-perceptual pre-training with rule-grounded supervised fine-tuning to improve both predictive accuracy and auditability of the model’s decisions.
  • SleepVLM achieves Cohen’s kappa of 0.767 on MASS-SS1 and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art sleep staging performance.
  • Expert evaluation indicates the generated explanations are high-quality, with mean scores above 4.0/5.0 for factual accuracy, evidence completeness, and logical coherence.
  • The authors release the MASS-EX dataset, an expert-annotated resource intended to support further research in interpretable sleep medicine.

Abstract

While automated sleep staging has achieved expert-level accuracy, its clinical adoption is hindered by a lack of auditable reasoning. We introduce SleepVLM, a rule-grounded vision-language model (VLM) designed to stage sleep from multi-channel polysomnography (PSG) waveform images while generating clinician-readable rationales based on American Academy of Sleep Medicine (AASM) scoring criteria. Utilizing waveform-perceptual pre-training and rule-grounded supervised fine-tuning, SleepVLM achieved Cohen's kappa scores of 0.767 on an held out test set (MASS-SS1) and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art performance. Expert evaluations further validated the quality of the model's reasoning, with mean scores exceeding 4.0/5.0 for factual accuracy, evidence comprehensiveness, and logical coherence. By coupling competitive performance with transparent, rule-based explanations, SleepVLM may improve the trustworthiness and auditability of automated sleep staging in clinical workflows. To facilitate further research in interpretable sleep medicine, we release MASS-EX, a novel expert-annotated dataset.