Abstract
The graph convolutional network (GCN) has become a mainstream technology in skeleton-based action recognition since it was first applied to this field. However, previous studies often overlooked the pivotal role of heuristic model initialization in the extraction of spatial features, impeding the model from achieving its optimal performance. To address this issue, a lightweight initialization-enhanced adaptive graph convolutional network (LI-AGCN) is proposed, which effectively captures spatiotemporal features while maintaining low computational complexity. LI-AGCN employs three coordinate-based input branches (CIB) to dynamically adjust graph structures, which facilitates the extraction of informative spatial features. In addition, the model incorporates a lightweight and multi-scale temporal module to extract temporal feature, and employs an attention module that considers the temporal, spatial, and channel dimensions simultaneously to enhance key features. Finally, the performance of our proposed model is evaluated on three large-scale public datasets: NTU RGB+D, NTU RGB+D 120, and UAV-Human. The experimental results demonstrate that the LI-AGCN achieves excellent comprehensive performances on these datasets, especially obtaining 90.03% accuracy on the cross-subject benchmark of the NTU RGB+D dataset with only 0.18 million parameters, showcasing the effectiveness of the model.