Integrating Cross-Modal Semantic Learning with Generative Models for Gesture Recognition

将跨模态语义学习与生成模型相结合用于手势识别

阅读:1

Abstract

Radio frequency (RF)-based human activity sensing is an essential component of ubiquitous computing, with WiFi sensing providing a practical and low-cost solution for gesture and activity recognition. However, challenges such as manual data collection, multipath interference, and poor cross-domain generalization hinder real-world deployment. Existing data augmentation approaches often neglect the biomechanical structure underlying RF signals. To address these limitations, we present CM-GR, a cross-modal gesture recognition framework that integrates semantic learning with generative modeling. CM-GR leverages 3D skeletal points extracted from vision data as semantic priors to guide the synthesis of realistic WiFi signals, thereby incorporating biomechanical constraints without requiring extensive manual labeling. In addition, dynamic conditional vectors are constructed from inter-subject skeletal differences, enabling user-specific WiFi data generation without the need for dedicated data collection and annotation for each new user. Extensive experiments on the public MM-Fi dataset and our SelfSet dataset demonstrate that CM-GR substantially improves the cross-subject gesture recognition accuracy, achieving gains of up to 10.26% and 9.5%, respectively. These results confirm the effectiveness of CM-GR in synthesizing personalized WiFi data and highlight its potential for robust and scalable gesture recognition in practical settings.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。