Learning Generalizable Action Representations via Pre-training AEMG

arXiv cs.LG / 5/6/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces Any Electromyography (AEMG), a large-scale self-supervised framework to learn representations of EMG that generalize better across subjects, devices, and tasks.
  • AEMG treats neuromuscular dynamics in a “linguistic” way by using a Neuromuscular Contraction Tokenizer (NCT) that converts discrete muscle contractions into structural tokens and temporal activations into sentence-like patterns.
  • The authors build a very large cross-device EMG signal vocabulary, aiming to support transfer across different channel layouts and sampling rates.
  • Experiments show AEMG boosts zero-shot leave-one-subject-out (LOSO) accuracy by 5.79–9.25% versus six state-of-the-art baselines, and exceeds 90% few-shot adaptation performance using only about 5% of target-user data.
  • Overall, the work positions EMG as a “cross-device physiological language” and proposes groundwork for a single, universally applicable EMG foundation model trained once.

Abstract

A fundamental role in decoding human motor intent and enabling intuitive human-computer interaction is played by electromyography (EMG). However, its generalization capability across subjects, devices, and tasks remains substantially limited by data heterogeneity, label scarcity, and the lack of a unified representational framework. To bridge this gap, we propose Any Electromyography (AEMG), the first large-scale, self-supervised representation learning framework for EMG. AEMG reconceptualizes neuromuscular dynamics linguistically, utilizing a novel Neuromuscular Contraction Tokenizer (NCT) to translate discrete muscle contractions into structural words and temporal activation patterns into coherent sentences. Furthermore, we compile the largest cross-device EMG signal vocabulary to date, enabling seamless transfer across arbitrary channel topologies and sampling rates. Experiments demonstrate that AEMG improves the zero-shot leave-one-subject-out (LOSO) accuracy by 5.79-9.25% compared to six state-of-the-art baselines, and achieves more than 90% few-shot adaptation performance with only 5% of target user data. Our work has proposed the concept of EMG signals as a cross-device physiological language, learned their grammar from massive amounts of data, and laid the groundwork for a single-training, universally applicable EMG foundation model.