Abstract
Human activity recognition (HAR) faces significant challenges in effectively capturing multi-scale temporal patterns while maintaining feature propagation in deep networks. Current approaches suffer from information loss in deep architectures and inadequate temporal feature extraction at different scales, which makes it difficult for the model to further improve its recognition accuracy. To address this issue, we propose HybridHAR, a novel deep learning model that integrates three key ideas. First, it employs a parallel multi-scale CNN structure with different kernel sizes for multi-temporal feature extraction. Second, it incorporates a residual attention mechanism with channel-wise feature fusion. Finally, it includes a deep supervision module with auxiliary classification. The model architecture is evaluated on the UCI HAR dataset, with comprehensive comparative experiments to validate each idea's contribution. Our experiment results show that HybridHAR achieves state-of-the-art performance with 98.91% validation accuracy and 96.06% test accuracy, significantly outperforming previous approaches in our investigation. This result demonstrates the high performance and robust feature learning capability of HybridHAR and provides a new model architecture for sensor-based human activity recognition, with potential applications in healthcare monitoring, smart environments, and many other fields.