Introducing WARM-VR: Benchmark Dataset for Multimodal Wearable Affect Recognition in Virtual Reality

arXiv cs.LG / 5/4/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces WARM-VR, a new publicly available multimodal benchmark dataset for wearable affect recognition specifically targeting immersive VR settings.
  • Data were collected from 31 participants using wearable sensors (wristband: BVP, EDA, skin temperature, acceleration; chest strap: ECG) while they experienced VR sessions that elicited stress then relaxation.
  • The VR experience included synchronized multimodal stimuli—visual, auditory, and olfactory—to study how multisensory cues affect emotional state changes.
  • Subjective questionnaire results indicate VR relaxation significantly reduces negative affect, with olfactory enhancement showing particular benefit.
  • Baseline experiments establish performance benchmarks: CNN/CNN-Bi-GRU on BVP for valence (best average F1 0.63, AUC 0.69), lightweight Transformers for arousal, and CNN-Bi-GRU achieving top results for the relaxation task (average F1 0.64, AUC 0.69).

Abstract

With the growing integration of human-computer interaction into everyday life, advances in machine learning have enabled systems to better perceive and respond to users' emotional states. Most existing affect recognition datasets focus on static environments, limiting their applicability to immersive multimedia contexts such as Virtual Reality (VR). In this paper, we introduce WARM-VR, a novel publicly available multimodal dataset designed to support affect recognition in immersive, multisensory environments using wearable sensing instrumentation. Data were collected from 31 participants aged 19-37 using wearable sensors: a wristband measuring Blood Volume Pulse (BVP), EDA, skin Temperature, three-axis Acceleration, and a chest strap recording ECG signals. Participants engaged in immersive VR experiences designed to elicit relaxation through a calming beach environment following stress induction via an arithmetic task. These sessions incorporated synchronized multimedia stimuli: visual, auditory, and olfactory. Affective states were assessed subjectively through validated self-report questionnaires and objectively through the analysis of physiological measurements. Statistical analysis of the questionnaires confirmed that VR relaxation significantly reduced negative affect, particularly with olfactory enhancement. Furthermore, we established a benchmark on the dataset using widely recognized machine learning algorithms. The best performance for binary classification from BVP data of valence, was obtained with a CNN and a CNN-Bi-GRU model, both achieving an average F1-score of 0.63 and an AUC of 0.69. For arousal, a lightweight Transformer architecture provided the most balanced results (F1-0 0.54 and F1-1 0.63), outperforming recurrent hybrids. In the relaxation task, a CNN-Bi-GRU model reached the highest overall performance (average F1-score 0.64, AUC 0.69).