A Multi-Modal CNN-LSTM Framework with Multi-Head Attention and Focal Loss for Real-Time Elderly Fall Detection

arXiv cs.AI / 3/25/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The paper presents MultiModalFallDetector, a multi-modal wearable-sensor deep learning framework for real-time elderly fall detection using tri-axial accelerometer, gyroscope, and multi-channel physiological signals.
  • It combines a multi-scale CNN feature extractor, multi-head self-attention for dynamic temporal weighting, and an auxiliary activity classification task to regularize training.
  • To address class imbalance common in fall datasets, the method uses Focal Loss and applies transfer learning from UCI HAR to the SisFall dataset.
  • Experiments on SisFall report strong performance (F1 98.7, Recall 98.9, AUC-ROC 99.4) and demonstrate low-latency inference (under 50ms) suitable for edge deployment in geriatric care.

Abstract

The increasing global aging population has intensified the demand for reliable health monitoring systems, particularly those capable of detecting critical events such as falls among elderly individuals. Traditional fall detection approaches relying on single-modality acceleration data suffer from high false alarm rates, while conventional machine learning methods require extensive hand-crafted feature engineering. This paper proposes a novel multi-modal deep learning framework, MultiModalFallDetector, designed for real-time elderly fall detection using wearable sensors. Our approach integrates multiple innovations: a multi-scale CNN-based feature extractor capturing motion dynamics at varying temporal resolutions; fusion of tri-axial accelerometer, gyroscope, and four-channel physiological signals; incorporation of a multi-head self-attention mechanism for dynamic temporal weighting; adoption of Focal Loss to mitigate severe class imbalance; introduction of an auxiliary activity classification task for regularization; and implementation of transfer learning from UCI HAR to SisFall dataset. Extensive experiments on the SisFall dataset, which includes real-world simulated fall trials from elderly participants (aged 60-85), demonstrate that our framework achieves an F1-score of 98. 7, Recall of 98. 9, and AUC-ROC of 99. 4, significantly outperforming baseline methods including traditional machine learning and standard deep learning approaches. The model maintains sub- 50ms inference latency on edge devices, confirming its suitability for real-time deployment in geriatric care settings.