Skeleton motion topology-masked prediction and contrastive learning for self-supervised human action recognition

基于骨骼运动拓扑掩蔽预测和对比学习的自监督人体动作识别

阅读:3

Abstract

To address the limitations in data augmentation and neglect of joint dependencies in self-supervised human action recognition, this paper proposes a hybrid framework that integrates topology-masked motion modeling with contrastive learning. The proposed motion topology-masking technique jointly encodes skeletal topology and motion dynamics, preventing the model from over-focusing on temporally salient regions of prominent motions. We employ a multi-stage hybrid augmentation strategy, combining conventional and extreme augmentation methods to generate diverse, enriched positive pairs for contrastive learning. Additionally, we introduce a trajectory-guided feature dropping module, which selectively discards critical features based on trajectory attention maps, preventing the model from avoiding excessive focus on local joint trajectories. This approach effectively leverages large-scale unlabeled skeleton data through self-supervised learning, significantly reducing reliance on costly annotated datasets. Extensive experiments on NTU-60, NTU-120, and PKU-MMD demonstrate that the proposed model achieves superior performance in both occluded scenarios under complex environments and low-supervision conditions. It effectively mitigates visual interference and annotation scarcity while substantially improving action recognition accuracy.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。