Abstract
OBJECTIVE: Assessing the types of physical activity (PA) in young children is crucial for exploring their relationship with health. Although existing machine learning methods have made certain progress, there are still deficiencies in non-contact, real-time dynamic monitoring and recognition accuracy. This study aims to utilize computer vision technology to construct non-contact, real-time dynamic, and high-precision recognition models for the types of physical activities of children aged 3–6 years in different age groups and overall, to provide scientific and practical tools for monitoring children’s daily physical activities and preventing health problems such as obesity. METHODS: The portable Drift Ghost XL camera collected video data of 11 types of physical activities from 72 children aged 3 to 6 (average age = 4.7 ± 0.9). These activities included: sitting still, sitting activity, standing still, standing activity, walking, running, crawling, jumping, cycling, climbing, and stair walking. Each type was recorded for 8 to 10 min. A total of 18,870 valid samples were obtained using the time-slice method. The PA type dataset was labeled and constructed using Labelimg software and expanded fivefold through data augmentation. Based on Yolov11 object detection technology, computer vision recognition models for PA types of children aged 3–4, 4–5, 5–6, and 3–6 were established, respectively. Their performance was evaluated using metrics such as F1 score and mAP. RESULTS: The model of children aged 3–6 achieved an F1 score of 96.0%. Among the age-specific models, the model of children aged 3–4 and 4–5 achieved the highest recognition rate of 95.0%, while the model of children aged 5–6 reached 94.0%. Regarding recognition accuracy, the model of children aged 3–6 achieved 98.3%. Among the age-specific models, the model of children aged 3–4 had the highest recognition rate of 98.2%, followed by 98.0% for the model of children aged 4–5 and 97.6% for the model of children aged 5–6. Yolov11 achieved recognition rates ranging from 96.9% to 99.4% for 11 PA types in the model of children aged 3–6, as follows: 98.9% (sitting still), 99.1% (sitting activity), 98.2% (standing still), 97.9% (standing activity), 97.7% (walking), 98.4% (running), 98.3% (crawling), 97.7% (jumping), 98.7% (cycling), 99.4% (climbing), and 96.9% (stair walking). CONCLUSIONS: Computer vision technology could effectively recognize PA types in children aged 3–6 under non-wearable, real-time dynamic conditions, with performance surpassing existing wearable device-dependent machine learning techniques. Age-specific models (children aged 3–4, 4–5, 5–6) also demonstrated excellent performance, confirming the method’s effectiveness and applicability across different developmental stages of early childhood.