Hybrid Convolutional Vision Transformer for Robust Low-Channel sEMG Hand Gesture Recognition: A Comparative Study with CNNs

用于鲁棒低通道表面肌电手势识别的混合卷积视觉变换器:与卷积神经网络的比较研究

阅读:1

Abstract

Hand gesture classification using surface electromyography (sEMG) is fundamental for prosthetic control and human-machine interaction. However, most existing studies focus on high-density recordings or large gesture sets, leaving limited evidence on performance in low-channel, reduced-gesture configurations. This study addresses this gap by comparing a classical convolutional neural network (CNN), inspired by Atzori's design, with a Convolutional Vision Transformer (CViT) tailored for compact sEMG systems. Two datasets were evaluated: a proprietary Myo-based collection (10 subjects, 8 channels, six gestures) and a subset of NinaPro DB3 (11 transradial amputees, 12 channels, same gestures). Both models were trained using standardized preprocessing, segmentation, and balanced windowing procedures. Results show that the CNN performs robustly on homogeneous signals (Myo: 94.2% accuracy) but exhibits increased variability in amputee recordings (NinaPro: 92.0%). In contrast, the CViT consistently matches or surpasses the CNN, reaching 96.6% accuracy on Myo and 94.2% on NinaPro. Statistical analyses confirm significant differences in the Myo dataset. The objective of this work is to determine whether hybrid CNN-ViT architectures provide superior robustness and generalization under low-channel sEMG conditions. Rather than proposing a new architecture, this study delivers the first systematic benchmark of CNN and CViT models across amputee and non-amputee subjects using short windows, heterogeneous signals, and identical protocols, highlighting their suitability for compact prosthetic-control systems.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。