A deep learning-based method combines manual and non-manual features for sign language recognition

一种基于深度学习的方法,结合了人工和非人工特征进行手语识别。

阅读:1

Abstract

This research examines the advantages of utilizing 3D hand skeletal information for sign language identification from RGB videos within a cutting-edge, multi-stream deep learning recognition framework. Since most sign language datasets are just standard RGB video with no depth information, we want to use a robust architecture that has been mostly used for 3D human pose estimation to get 3D coordinates of hand joints from RGB data. After that, we combine these estimates with extra sign language data streams, such as convolutional neural network-derived representations of the hand and head pose estimation, using an attention-based encoder-decoder to identify the signs. We assess our proposed methodology using a corpus of isolated signs from AUTSL and WLASL, demonstrating substantial improvements through the incorporation of 3D hand posture data. Our method achieved 90.5% accuracy on AUTSL and 88.2% accuracy on WLASL, with F1-scores over 0.89, which is better than several state-of-the-art approaches.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。