Stress Classification from ECG Signals Using Vision Transformer

arXiv cs.AI / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a vision-transformer approach for multilevel stress classification from ECG by converting raw signals into 2D STFT spectrograms and feeding patched inputs to a transformer encoder.
It addresses the challenge of inter-subject variability using leave-one-subject-out cross-validation (LOSOCV) on WESAD and RML datasets, comparing against 1D CNN and ResNet-18 baselines.
Experimental results indicate the vision transformer outperforms CNN-based models and prior state-of-the-art methods, showing stronger robustness to intersubject differences.
The method is end-to-end and avoids handcrafted features, learning representations directly from spectrogram patch data.
Reported performance includes 71.01% (RML) and 76.7% (WESAD) accuracy for three-class classification and 88.3% accuracy for binary classification on WESAD.

Abstract

Vision Transformers have shown tremendous success in numerous computer vision applications; however, they have not been exploited for stress assessment using physiological signals such as Electrocardiogram (ECG). In order to get the maximum benefit from the vision transformer for multilevel stress assessment, in this paper, we transform the raw ECG data into 2D spectrograms using short time Fourier transform (STFT). These spectrograms are divided into patches for feeding to the transformer encoder. We also perform experiments with 1D CNN and ResNet-18 (CNN model). We perform leave-onesubject-out cross validation (LOSOCV) experiments on WESAD and Ryerson Multimedia Lab (RML) dataset. One of the biggest challenges of LOSOCV based experiments is to tackle the problem of intersubject variability. In this research, we address the issue of intersubject variability and show our success using 2D spectrograms and the attention mechanism of transformer. Experiments show that vision transformer handles the effect of intersubject variability much better than CNN-based models and beats all previous state-of-the-art methods by a considerable margin. Moreover, our method is end-to-end, does not require handcrafted features, and can learn robust representations. The proposed method achieved 71.01% and 76.7% accuracies with RML dataset and WESAD dataset respectively for three class classification and 88.3% for binary classification on WESAD.

Black Hat Asia

AI Business

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

Dev.to

BYOK is not just a pricing model: why it changes AI product trust

Dev.to

AI Citation Registries and Identity Persistence Across Records

Dev.to

Stress Classification from ECG Signals Using Vision Transformer

Key Points

Abstract

Related Articles

Black Hat Asia

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

BYOK is not just a pricing model: why it changes AI product trust

AI Citation Registries and Identity Persistence Across Records

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer