AI Navigate

Bio-Inspired Self-Supervised Learning for Wrist-worn IMU Signals

arXiv cs.LG / 3/12/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a tokenization strategy based on submovement theory, treating wrist motion as sequences of movement segments rather than unstructured time series.
  • It pretrains a Transformer encoder with masked movement-segment reconstruction to model temporal dependencies between segments, focusing on higher-level movement structure rather than local waveform morphology.
  • Pretraining on the NHANES dataset (about 28k hours, ~11k participants, ~10M windows) yields representations that outperform strong wearable SSL baselines on six subject-disjoint HAR benchmarks and show improved data efficiency in data-scarce settings.
  • The work emphasizes leveraging biological structure in movement for HAR and will release code and pretrained weights to the community.

Abstract

Wearable accelerometers have enabled large-scale health and wellness monitoring, yet learning robust human-activity representations has been constrained by the scarcity of labeled data. While self-supervised learning offers a potential remedy, existing approaches treat sensor streams as unstructured time series, overlooking the underlying biological structure of human movement, a factor we argue is critical for effective Human Activity Recognition (HAR). We introduce a novel tokenization strategy grounded in the submovement theory of motor control, which posits that continuous wrist motion is composed of superposed elementary basis functions called submovements. We define our token as the movement segment, a unit of motion composed of a finite sequence of submovements that is readily extractable from wrist accelerometer signals. By treating these segments as tokens, we pretrain a Transformer encoder via masked movement-segment reconstruction to model the temporal dependencies of movement segments, shifting the learning focus beyond local waveform morphology. Pretrained on the NHANES corpus (approximately 28k hours; approximately 11k participants; approximately 10M windows), our representations outperform strong wearable SSL baselines across six subject-disjoint HAR benchmarks. Furthermore, they demonstrate stronger data efficiency in data-scarce settings. Code and pretrained weights will be made publicly available.