Leveraging Imperfect Medical Data: A Manifold-Consistent Spatio-Temporal Network for Sensor-based Human Activity Recognition

arXiv cs.CV / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets sensor-based human activity recognition (HAR) for healthcare monitoring, where wearable/IoMT signals are often incomplete or corrupted by missing data, sensor failures, and noise.
  • It introduces a Manifold-Consistent Spatio-Temporal Network (MCSTN) that models realistic sensing imperfections using a dual-level corruption approach (physical-level corruption and diffusion-driven continuous corruption).
  • MCSTN improves robustness by enforcing representation consistency across multiple corrupted views so the learned semantics remain stable and corruption-invariant.
  • The model uses a dual-stream spatio-temporal architecture that separates temporal-dynamics learning from spatial correlation learning across sensors to strengthen spatio-temporal representations.
  • Experiments on PAMAP2, Opportunity, and WISDM show MCSTN achieves competitive results, with particular gains when inputs are imperfect, supporting its suitability for real-world wearable IoMT deployments.

Abstract

Sensor-based Human Activity Recognition (HAR) has attracted increasing attention in medical and healthcare monitoring, particularly with the growth of Internet of Medical Things (IoMT). However, in real-world wearable sensing scenarios, IoMT signals are often corrupted by missing measurements, sensor failures, and environmental noise, which significantly degrade the performance of conventional deep learning models that assume clean and complete inputs. To address this challenge, we propose a Manifold-Consistent Spatio-Temporal Network (MCSTN) for robust HAR under imperfect sensing conditions. The proposed framework introduces a dual-level corruption modeling mechanism that simulates realistic sensor imperfections through both physical-level corruption and diffusion-driven continuous corruption. By enforcing representation consistency across multiple corrupted views, the model learns stable and corruption-invariant semantic representations. Furthermore, we design a dual-stream spatio-temporal architecture that explicitly decouples temporal dynamics modeling and spatial correlation learning. The temporal stream captures long-term activity dynamics, while the spatial stream models inter-sensor relationships, enabling more effective spatio-temporal representation learning. Extensive experiments on three widely used HAR benchmark datasets, PAMAP2, Opportunity, and WISDM, demonstrate that the proposed MCSTN achieves competitive performance compared with existing state-of-the-art methods, particularly under imperfect sensing conditions. These results validate the effectiveness and robustness of the proposed framework for real-world wearable IoMT sensing applications.