Multilevel neural networks with dual-stage feature fusion for human activity recognition

arXiv cs.CV / 4/21/2026

📰 NewsModels & Research

Key Points

  • The paper proposes a two-level neural network framework for human activity recognition (HAR) that uses dual-stage feature fusion, combining both intermediate and late fusion strategies.
  • It investigates how different structural arrangements and fusion choices affect performance by evaluating 15 CNN/LSTM/hybrid architectures with late fusion alone versus late fusion plus intermediate fusion.
  • Experiments on two public benchmark datasets show that using both intermediate and late fusion yields higher accuracy than relying on late fusion alone.
  • The best-performing configuration also surpasses baseline models, supporting the framework’s effectiveness for improving HAR accuracy.

Abstract

Human activity recognition (HAR) refers to the process of identifying human actions and activities using data collected from sensors. Neural networks, such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, convolutional LSTM, and their hybrid combinations, have demonstrated exceptional performance in various research domains. Developing a multilevel individual or hybrid model for HAR involves strategically integrating multiple networks to capitalize on their complementary strengths. The structural arrangement of these components is a critical factor influencing the overall performance. This study explores a novel framework of a two-level network architecture with dual-stage feature fusion: late fusion, which combines the outputs from the first network level, and intermediate fusion, which integrates the features from both the first and second levels. We evaluated 15 different network architectures of CNNs, LSTMs, and convolutional LSTMs, incorporating late fusion with and without intermediate fusion, to identify the optimal configuration. Experimental evaluation on two public benchmark datasets demonstrates that architectures incorporating both late and intermediate fusion achieve higher accuracy than those relying on late fusion alone. Moreover, the optimal configuration outperforms baseline models, thereby validating its effectiveness for HAR.