Abstract
Exoskeleton robots function as augmentation systems that establish mechanical couplings with the human body, substantially enhancing the wearer's biomechanical capabilities through assistive torques. We introduce a lumbar spine-assisted exoskeleton design based on Variable-Stiffness Pneumatic Artificial Muscles (VSPAM) and develop a dynamic adaptation mechanism bridging the pneumatic drive module with human kinematic intent to facilitate human-robot cooperative control. For kinematic intent resolution, we propose a multimodal fusion architecture integrating the VGG16 convolutional network with Long Short-Term Memory (LSTM) networks. By incorporating self-attention mechanisms, we construct a fine-grained relational inference module that leverages multi-head attention weight matrices to capture global spatio-temporal feature dependencies, overcoming local feature constraints inherent in traditional algorithms. We further employ cross-attention mechanisms to achieve deep fusion of visual and kinematic features, establishing aligned intermodal correspondence to mitigate unimodal perception limitations. Experimental validation demonstrates 96.1% ± 1.2% motion classification accuracy, offering a novel technical solution for rehabilitation robotics and industrial assistance.