PPG-Based Affect Recognition with Long-Range Deep Models: A Measurement-Driven Comparison of CNN, Transformer, and Mamba Architectures

arXiv cs.LG / 4/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper evaluates four deep learning architectures—CNN, CNN-LSTM, Transformers, and Mamba—for classifying affect states (arousal, valence, relaxation) from wrist-based photoplethysmography (PPG) signals.
  • Using identical preprocessing, segmentation, and training pipelines with subject-independent 5-fold cross-validation, the study directly compares whether long-range sequence models bring advantages over CNN/LSTM baselines on small, noisy datasets.
  • Results show Transformers and Mamba reach performance comparable to the CNN baseline, but they do not consistently beat CNNs across all affect recognition tasks.
  • Overall, CNNs are the most effective, achieving the highest accuracy with the smallest model size, while Transformers offer a better balance of F1 scores for arousal and relaxation.
  • The work is positioned as the first evaluation of Transformer and Mamba for PPG-based affect recognition, providing guidance for selecting models in wearable affective monitoring systems.

Abstract

Photoplethysmography (PPG) is increasingly used in wearable affective computing due to its low cost and ease of integration into consumer devices. Recent advances in deep learning have introduced long-range sequence models, such as Transformers, and state-space models, like Mamba, which have demonstrated strong performance on natural language and general time-series tasks. However, it remains unclear whether these architectures offer tangible benefits over widely used Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTMs) for PPG-based affect recognition, given that datasets are typically small and noisy. This work presents a measurement-driven comparison of four deep learning architectures, CNN, CNN-LSTM hybrid, Transformers, and Mamba, for classifying arousal, valence, and relaxation states from wrist-based PPG signals. All models are evaluated under a subject-independent 5-fold cross-validation protocol using identical preprocessing, segmentation, and training pipelines. Our results show that the Transformer and Mamba models achieve performance comparable to that of a CNN baseline, but do not consistently outperform it across all tasks. CNNs remain the most effective overall, providing the highest accuracy with the smallest model size, whereas Transformers have a better balance of F1 scores for Arousal and Relaxation. The study provides the first evaluation of Transformer and Mamba models for PPG-based affect recognition, offering practical guidance on model selection for wearable affective monitoring systems.