Abstract
Amid the ongoing integration of urban and rural areas, the sports industry has been shifting toward intelligent transformation and high-quality development. To address the growing demand for efficient and low-computation-cost pose recognition in intelligent fitness and entertainment scenarios, this study proposes an optimized method based on lightweight deep learning techniques. Specifically, the DeepLabV3 + semantic segmentation model was used to extract the coordinate positions of sports equipment. For human keypoint detection, a streamlined version of the OpenPose network was introduced and enhanced with a spatial attention module, improving the model's ability to capture critical local features. The proposed method was evaluated through comparative experiments conducted across multiple datasets and keypoint positions. The results showed that the DeepLabV3+-Cross Stage Partial (CSP)-Darknet53 model achieved superior performance, with F1 score, accuracy, and recall values all exceeding 0.9. This model consistently outperformed baseline algorithms across all test points, achieving the highest accuracy (0.97) at point 10, the highest precision (0.98) at point 11, the highest recall (0.98) at points 10 and 16, and the best F1 score (0.98) at point 11. By integrating DeepLabV3 + semantic segmentation, a lightweight OpenPose architecture, and spatial attention mechanisms, this study provides an effective and computationally efficient solution for sports pose recognition. The proposed model demonstrates strong recognition capabilities across multiple metrics and supports the advancement of intelligent sports applications.