Abstract
OBJECTIVE: Accurate assessment of physiotherapy exercises is critical for effective rehabilitation, particularly for elderly and mobility-impaired individuals. While telerehabilitation offers a viable alternative to in-clinic supervision, existing approaches often rely on single-modality sensors, limiting robustness and adaptability. This study aims to develop a multimodal, markerless framework for reliable home-based physiotherapy exercise recognition. METHODS: A deep learning-based multimodal framework is proposed that integrates synchronized RGB and depth streams. From RGB data, two-dimensional keypoints, semantic body-part labels, and contour-based visual descriptors are extracted. Depth silhouettes are used to estimate three-dimensional joint positions and reconstruct full-body meshes using the Skinned Multi-Person Linear model, along with global shape descriptors such as Zernike moments. Multimodal features are fused and refined using Kernel Fisher Discriminant Analysis, followed by classification using a Graph Convolutional Network to capture spatial and temporal relationships. RESULTS: The proposed framework was evaluated on three publicly available rehabilitation datasets: KIMORE, mRI, and UTKinect-Action3D. The system achieved classification accuracies of 95.30%, 92.70%, and 95.59%, respectively, demonstrating consistent performance across diverse rehabilitation-oriented benchmarks. CONCLUSIONS: The results suggest that integrating complementary RGB and depth-based representations can enhance robustness and accuracy in physiotherapy exercise recognition under home-based settings. The proposed framework shows potential for supporting accessible telerehabilitation, while future work will focus on broader validation and practical deployment considerations.