Abstract
Human pose estimation (HPE) is crucial for analyzing student behavioral dynamics and developing instructional evaluations in smart classrooms. However, in complex scenarios such as densely distributed students, existing methods often face challenges in keypoint feature extraction and localization accuracy. To address these issues, we propose a Feature-enhanced high-resolution network (FE-HRNet) for human pose estimation. The model first incorporates Res2Net modules into the backbone network, constructing a hierarchical residual connection structure to achieve fine-grained multi-scale feature representation and effectively expanding the network's receptive field. Second, we innovatively embed a Multi-scale convolution attention (MSCA) module, which captures spatial context information at different scales through multi-branch depth-wise stripe convolutions and combines channel attention mechanisms to enhance key features, significantly improving keypoint localization capability adaptively. Finally, experimental results on the COCO public dataset and our custom-developed Smart classroom pose (SCP) dataset validate that the proposed method delivers superior pose estimation performance in complex scenarios. The code is available at https://github.com/ldxguet/FEHRNet .