Channel-Free Human Activity Recognition via Inductive-Bias-Aware Fusion Design for Heterogeneous IoT Sensor Environments

arXiv cs.LG / 4/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets human activity recognition (HAR) in heterogeneous IoT settings where sensor types, body locations, modalities, and channel compositions vary across datasets and devices.
  • It proposes strict “channel-free” HAR, using a single shared model that does not assume a fixed number, order, or semantics of input channels and avoids sensor- or dataset-specific channel templates.
  • The core method is an inductive-bias-aware fusion design: channel-wise encoding followed by shared encoding, with metadata-conditioned late fusion using conditional batch normalization and joint optimization via a combination loss.
  • Experiments on PAMAP2 and evaluations across six HAR datasets—including robustness, ablations, sensitivity/efficiency, and cross-dataset transfer—report strong findings, highlighting the effectiveness of the fusion strategy and metadata conditioning.

Abstract

Human activity recognition (HAR) in Internet of Things (IoT) environments must cope with heterogeneous sensor settings that vary across datasets, devices, body locations, sensing modalities, and channel compositions. This heterogeneity makes conventional channel-fixed models difficult to reuse across sensing environments because their input representations are tightly coupled to predefined channel structures. To address this problem, we investigate strict channel-free HAR, in which a single shared model performs inference without assuming a fixed number, order, or semantic arrangement of input channels, and without relying on sensor-specific input layers or dataset-specific channel templates. We argue that fusion design is the central issue in this setting. Accordingly, we propose a channel-free HAR framework that combines channel-wise encoding with a shared encoder, metadata-conditioned late fusion via conditional batch normalization, and joint optimization of channel-level and fused predictions through a combination loss. The proposed model processes each channel independently to handle varying channel configurations, while sensor metadata such as body location, modality, and axis help recover structural information that channel-independent processing alone cannot retain. In addition, the joint loss encourages both the discriminability of individual channels and the consistency of the final fused prediction. Experiments on PAMAP2, together with robustness analysis on six HAR datasets, ablation studies, sensitivity analysis, efficiency evaluation, and cross-dataset transfer learning, demonstrate three main findings...

Continue reading this article on the original site.

Read original →