Local pattern aware 3D video swin transformer with masked autoencoding for realtime augmented reality gesture interaction

具有掩码自动编码功能的局部模式感知 3D 视频 Swin Transformer，用于实时增强现实手势交互

阅读：1

期刊：	Scientific Reports	影响因子：	3.900
时间：	2025	起止号：	2025 Jul 1;15(1):21318
doi：	10.1038/s41598-025-05935-9

Abstract

This study proposes a real-time augmented reality gesture interaction algorithm based on the Swin Transformer and a masked self-encoder. This algorithm solves the challenges of the traditional Transformer model regarding spatio-temporal feature extraction and real-time performance. During data preprocessing, the study uses a synthetic data annotation method to automatically generate 3D gesture images and annotate joint information, significantly improving data annotation efficiency. Using weighted Euclidean distance and structural similarity optimization, the paper proposes an image denoising model based on maximum a posteriori probability that effectively reduces noise interference in gesture image analysis. The gesture detection and segmentation module combines EfficientNet and Transformer models. It fuses shallow and deep features through skip connections, realizes multi-scale feature extraction, and enhances attention to the target area through the triplet attention module. Additionally, the paper introduces the local texture feature prior (RTHLBP) to optimize gesture recognition and segmentation accuracy. In the gesture classification module, the paper proposes a ViT architecture based on a masked autoencoder. It aligns features at different levels through a dynamic weight fusion strategy and combines the relative total variation map as a self-monitoring element. This significantly improves classification performance. Experimental results demonstrate that the proposed model's accuracy, F1 score, and MIoU on the 4 GTEA sub-dataset surpass those of traditional CNN, Transformer, MobileNet, and DenseNet models, particularly on small datasets. The paper also optimizes the model's real-time performance through a multi-core parallel computing strategy. Experiments show that as the number of DSP cores increases, the computation time is significantly reduced and the computational efficiency remains at a high level.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。