Development of ML model for triboelectric nanogenerator based sign language detection system

arXiv cs.AI / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper develops and benchmarks machine learning and deep learning models for a TENG-based sensor glove that recognizes 11 sign classes (digits 1–5 and letters A–F) using five flex sensors.
  • A custom MFCC CNN-LSTM architecture outperforms traditional ML (Random Forest at 70.38%) by reaching 93.33% accuracy and 95.56% precision via frequency-domain (MFCC) features processed through parallel per-sensor CNN branches fused for temporal modeling.
  • Ablation experiments show that 50-timestep input windows provide a better balance of temporal context and training-data volume than 100-timestep windows (84.13% vs 58.06% accuracy).
  • The authors find that MFCC frequency-domain representations improve invariance to execution speed by mapping temporal variations to more stable spectral features, and they emphasize that data augmentation (time warping and noise injection) is important for generalization.
  • Overall, the results suggest that frequency-domain feature extraction combined with parallel multi-sensor deep architectures can outperform both classical ML and time-domain deep learning for wearable gesture recognition in assistive technology.

Abstract

Sign language recognition (SLR) is vital for bridging communication gaps between deaf and hearing communities. Vision-based approaches suffer from occlusion, computational costs, and physical constraints. This work presents a comparison of machine learning (ML) and deep learning models for a custom triboelectric nanogenerator (TENG)-based sensor glove. Utilizing multivariate time-series data from five flex sensors, the study benchmarks traditional ML algorithms, feedforward neural networks, LSTM-based temporal models, and a multi-sensor MFCC CNN-LSTM architecture across 11 sign classes (digits 1-5, letters A-F). The proposed MFCC CNN-LSTM architecture processes frequency-domain features from each sensor through independent convolutional branches before fusion. It achieves 93.33% accuracy and 95.56% precision, a 23-point improvement over the best ML algorithm (Random Forest: 70.38%). Ablation studies reveal 50-timestep windows offer a tradeoff between temporal context and training data volume, yielding 84.13% accuracy compared to 58.06% with 100-timestep windows. MFCC feature extraction maps temporal variations to execution-speed-invariant spectral representations, and data augmentation methods (time warping, noise injection) are essential for generalization. Results demonstrate that frequency-domain feature representations combined with parallel multi-sensor processing architectures offer enhancement over classical algorithms and time-domain deep learning for wearable sensor-based gesture recognition. This aids assistive technology development.