Gait Recognition via Deep Residual Networks and Multi-Branch Feature Fusion

arXiv cs.CV / 5/1/2026

📰 NewsModels & Research

Key Points

  • The paper introduces a high-precision gait recognition framework aimed at improving biometric identification for surveillance and security while addressing covariate interference such as viewpoint, clothing, and carrying conditions.
  • It uses HRNet to estimate skeletal keypoints with high spatial fidelity, then extracts three complementary feature branches (body proportion, gait velocity, and skeletal motion) from pose sequences.
  • A ResNet-50-based deep feature extraction module learns hierarchically rich and discriminative representations from the motion data.
  • A Multi-Branch Feature Fusion (MFF) module, inspired by channel-wise attention, dynamically weights and fuses the heterogeneous feature streams for more effective recognition.
  • On the CASIA-B cross-view, multi-condition benchmark, the method reports 94.52% Rank-1 accuracy for normal walking and achieves the best performance among skeleton-based methods for the coat-wearing condition.

Abstract

Gait recognition has emerged as a compelling biometric modality for surveillance and security applications, offering inherent advantages such as non-intrusiveness, resistance to disguise, and long-range identification capability. However, prevailing approaches struggle to comprehensively capture and exploit the rich biometric cues embedded in human locomotion, particularly under covariate interference including viewpoint variation, clothing change, and carrying conditions. In this paper, we present a high-precision gait recognition framework that deeply extracts and synergistically fuses gait dynamics with body shape characteristics through a multi-branch architecture grounded in deep residual learning. Specifically, we first employ the High-Resolution Network (HRNet) to perform robust skeletal keypoint estimation, preserving fine-grained spatial information even under low-resolution inputs. We then construct three complementary feature branches -- body proportion, gait velocity, and skeletal motion -- from the extracted pose sequences. A 50-layer Residual Network (ResNet-50) backbone is leveraged within a deep feature extraction module to capture hierarchically rich and discriminative representations. To effectively integrate heterogeneous feature streams, we design a Multi-Branch Feature Fusion (MFF) module inspired by channel-wise attention mechanisms, which dynamically allocates contribution weights across branches through learned activation parameters. Extensive experiments on the cross-view multi-condition CASIA-B benchmark demonstrate that our method achieves a Rank-1 accuracy of 94.52\% under normal walking, with the best recognition performance among skeleton-based methods for the coat-wearing condition.