A hybrid CNN-transformer framework optimized by Grey Wolf Algorithm for accurate sign language recognition

一种采用灰狼算法优化的混合 CNN-Transformer 框架,用于精确识别手语

阅读:1

Abstract

This paper introduces the Gray Wolf Optimized Convolutional Transformer Network, a combined deep learning framework aimed at accurately and efficiently recognizing dynamic hand gestures, especially in American Sign Language (ASL). The model integrates Convolutional Neural Networks (CNNs) for spatial feature extraction, Transformers for temporal sequence modeling, and Grey Wolf Optimization (GWO) for hyperparameter tuning. Extensive experiments were conducted on two benchmark datasets, ASL Alphabet and ASL MNIST to validate the model's effectiveness in both static and dynamic sign classification. The proposed model achieved superior performance across all key metrics, including a accuracy of 99.40%, F1-score of 99.31%, Matthews Correlation Coefficient (MCC) of 0.988, and Area Under the Curve (AUC) of 0.992, surpassing existing models such as PCA-IGWO, KPCA-IGWO, GWO-CNN, and AEGWO-NET. Real-time gesture detection outputs further demonstrated the model's robustness in varied environmental conditions and its applicability in assistive communication technologies. Additionally, the integration of GWO not only accelerated convergence but also enhanced generalization by optimally selecting model configurations. The results show that GWO-CTransNet offers a powerful, scalable solution for vision-based sign language recognition systems, combining high accuracy, fast inference, and adaptability in real-world applications.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。