Advanced gesture recognition in Indian sign language using a synergistic combination of YOLOv10 with Swin Transformer model

基于YOLOv10与Swin Transformer模型协同组合的印度手语高级手势识别

阅读:1

Abstract

Communication between deaf or mute individuals and hearing persons is often hindered by the lack of mutual understanding of sign or vocal language. To bridge this gap, Indian Sign Language Recognition (ISLR) systems are essential. This paper proposes a real-time ISLR framework based on the YOLOv10-ST model, which integrates the Swin Transformer into the YOLOv10 architecture for enhanced feature extraction. The model also incorporates Mish activation to improve gradient flow and detection accuracy. A custom dataset comprising 15, 000 static images (1, 000 per sign for 15 signs) and 35 dynamic videos (covering 7 sign classes) was used for training and evaluation. Experimental results demonstrate high performance, with the model achieving 97.50% precision, 98.10% recall, and 96.58% F1-score for image-based sign recognition, and 95.24% precision, 96.00% recall, and 95.87% F1-score for video-based gestures. The model also achieves a mean Average Precision (mAP) of 97.62% and real-time inference speeds of 48.7 FPS. Ablation studies validate the contributions of Swin Transformer and Mish activation, while paired t-tests confirm statistical significance (p [Formula: see text]). The experimental findings demonstrate that the YOLOv10-ST model efficiently recognizes static and dynamic ISL in real time with minimal computational overhead.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。