SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model
arXiv cs.CL / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- SleepVLM is introduced as a rule-grounded vision-language model for automated sleep staging from multi-channel polysomnography (PSG) waveform images that also produces clinician-readable rationales tied to AASM scoring criteria.
- The approach combines waveform-perceptual pre-training with rule-grounded supervised fine-tuning to improve both predictive accuracy and auditability of the model’s decisions.
- SleepVLM achieves Cohen’s kappa of 0.767 on MASS-SS1 and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art sleep staging performance.
- Expert evaluation indicates the generated explanations are high-quality, with mean scores above 4.0/5.0 for factual accuracy, evidence completeness, and logical coherence.
- The authors release the MASS-EX dataset, an expert-annotated resource intended to support further research in interpretable sleep medicine.



