Abstract
Gait recognition in unconstrained environments is severely hampered by variations in view, clothing, and carrying conditions. To address this, we introduce HierarchGait, a key-frame-aware hierarchical learning framework. Our approach uniquely integrates three complementary modules: a TemplateBlock-based Motion Extraction (TBME) for coarse-to-fine anatomical feature learning, a Sequence-Level Spatio-temporal Feature Aggregator (SSFA) to identify and prioritize discriminative key-frames, and a Frame-level Feature Re-segmentation Extractor (FFRE) to capture fine-grained motion details. This synergistic design yields a robust and comprehensive gait representation. We demonstrate the superiority of our method through extensive experiments. On the highly challenging CASIA-B dataset, HierarchGait achieves new state-of-the-art average Rank-1 accuracies of 98.1% under Normal (NM), 95.9% under Bag (BG), and 87.5% under Coat (CL) conditions. Furthermore, on the large-scale OU-MVLP dataset, our model attains a 91.5% average accuracy. These results validate the significant advantage of explicitly modeling anatomical hierarchies and temporal key-moments for robust gait recognition.