Abstract
Recent advances in computer vision and learning-based approaches have led to the emergence of markerless human pose estimation as a promising alternative to established motion capturing methods. Markerless methods offer several advantages, including low hardware requirements, affordability, fast setup, and reduced post-processing efforts. However, the accuracy of these methods and the speed at which poses are inferred tend to be lower when compared with marker-based optical motion capturing, which might restrict their range of applications. This study assessed the accuracy, precision, and inference speed of 11 different open source monocular markerless human pose estimators. For this we created Physio2.2M, a dataset comprising 2.2 million RGB frames of 25 unimpaired participants engaged in physical exercise paired with the corresponding ground truth measurements from an optical motion capture system using passive markers. The mean per joint position error between markerless human pose estimators and marker-based optical motion capturing was found to be in the range of 72 to 122 mm in 2D within the image plane and 146 to 249 mm in 3D when considering depth. The knee flexion angle was measured with a mean absolute error of [Formula: see text] in 2D and [Formula: see text] in 3D. The elbow flexion angle was measured with a mean absolute error of [Formula: see text] in 2D and [Formula: see text] in 3D. Some of the investigated 2D human pose estimators can achieve accuracies comparable to visual assessments. The inference speed of direct pose estimators ranged between 25 and 200 FPS and 2D-to-3D lifting methods achieved inference speeds of 117 to 9341 FPS. The accuracy and precision varied greatly between different pose estimators and between different image dimensions. This study offers a valuable comparison on the performance of different pose estimators in applications involving physical activities and highlights current limitations.