ActivityNarrated: An Open-Ended Narrative Paradigm for Wearable Human Activity Understanding

arXiv cs.LG / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues wearable human activity recognition (HAR) should shift from closed-set classification to an open-ended, narrative-based formulation that reflects real-world, unscripted, personalized behavior.
  • It proposes an open-vocabulary framework that aligns multi-position wearable sensor streams with free-form, time-aligned natural-language narratives so activity semantics can emerge without a predefined label set.
  • The approach includes (1) a naturalistic data collection/annotation pipeline, (2) a retrieval-based evaluation that measures semantic alignment between sensor data and language, and (3) a language-conditioned learning architecture for sensor-to-text inference.
  • Experiments report that fixed-label models degrade under participant and sensor-placement variability, while the open-vocabulary sensor-language alignment yields more robust representations and improves downstream closed-set recognition performance (65.3% Macro-F1 vs. 31–34% baselines).
  • The work positions narrative sensor-language alignment as a foundation for real-world wearable HAR, where closed-set recognition can be treated as a downstream special case after the alignment is learned.

Abstract

Wearable HAR has improved steadily, but most progress still relies on closed-set classification, which limits real-world use. In practice, human activity is open-ended, unscripted, personalized, and often compositional, unfolding as narratives rather than instances of fixed classes. We argue that addressing this gap does not require simply scaling datasets or models. It requires a fundamental shift in how wearable HAR is formulated, supervised, and evaluated. This work shows how to model open-ended activity narratives by aligning wearable sensor data with natural-language descriptions in an open-vocabulary setting. Our framework has three core components. First, we introduce a naturalistic data collection and annotation pipeline that combines multi-position wearable sensing with free-form, time-aligned narrative descriptions of ongoing behavior, allowing activity semantics to emerge without a predefined vocabulary. Second, we define a retrieval-based evaluation framework that measures semantic alignment between sensor data and language, enabling principled evaluation without fixed classes while also subsuming closed-set classification as a special case. Third, we present a language-conditioned learning architecture that supports sensor-to-text inference over variable-length sensor streams and heterogeneous sensor placements. Experiments show that models trained with fixed-label objectives degrade sharply under real-world variability, while open-vocabulary sensor-language alignment yields robust and semantically grounded representations. Once this alignment is learned, closed-set activity recognition becomes a simple downstream task. Under cross-participant evaluation, our method achieves 65.3% Macro-F1, compared with 31-34% for strong closed-set HAR baselines. These results establish open-ended narrative modeling as a practical and effective foundation for real-world wearable HAR.