Abstract
An accurate yet computationally efficient fall detection system for sports activities is a significant and challenging task. To address this, we propose a novel multi-stage fall detection framework that integrates 3D pose sequences with temporal convolutional modeling. The framework first performs 2D human pose estimation to extract and enhance multi-scale spatial features. Then, it reconstructs the 2D poses into 3D poses using a domain transfer architecture that aligns the 2D and 3D poses within a shared semantic space. Subsequently, we introduce a robust fall detection network that leverages temporal convolutions to process the 3D pose sequences, capturing long-term dependencies while maintaining low computational costs for fall event recognition. Evaluated on the large-scale benchmark action dataset NTU RGB+D, our method achieves a fall detection accuracy of 99.87%, demonstrating its state-of-the-art performance and effectiveness.