SSTA-ResT: Soft Spatiotemporal Attention ResNet Transformer for Argentine Sign Language Recognition

SSTA-ResT:用于阿根廷手语识别的软时空注意力 ResNet Transformer

阅读:1

Abstract

Sign language recognition technology serves as a crucial bridge, fostering meaningful connections between deaf individuals and hearing individuals. This technological innovation plays a substantial role in promoting social inclusivity. Conventional sign language recognition methodologies that rely on static images are inadequate for capturing the dynamic characteristics and temporal information inherent in sign language. This limitation restricts their practical applicability in real-world scenarios. The proposed framework, called SSTA-ResT, integrates ResNet, soft spatiotemporal attention, and Transformer encoders to achieve this objective. The framework utilizes ResNet to extract robust spatial feature representations, employs the lightweight SSTA module for dual-path complementary representation enhancement to strengthen spatiotemporal associations, and leverages the Transformer encoder to capture long-range temporal dependencies. Experimental results on the LSA64 Argentine Sign Language (ASL) dataset demonstrate that the proposed method achieves an accuracy of 96.25%, a precision of 97.18%, and an F1 score of 0.9671. These results surpass the performance of existing methods across all metrics while maintaining a relatively low model parameter count of 11.66 M. This demonstrates the framework's effectiveness and practicality for sign language video recognition tasks.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。