Abstract
Human Activity Recognition (HAR) using data streams from wearable sensors is challenging due to high data dimensionality, noise, and the lack of labeled data in unsupervised settings. Our prior work proved that traditional clustering models, which achieve state-of-the-art performance on simulated datasets, perform poorly on time-series numeric sensor data. This paper explores different autoencoder (AE) architectures to extract latent features with reduced dimensionality from streaming HAR datasets, which is then clustered using a clustering model to identify different activity patterns. Since the vanilla AE has shortcomings in learning distinguishing data patterns from spatio temporal time-series sensor data, we leverage the vanilla AE with convolutional, long-short term memory (LSTM), and a combination of convolutional and LSTM layers in multiple design phases. We apply supervised learning to train a superior spatio-temporal feature extraction AE model. Using the data features extracted by the trained AE, we train a clustering model with unsupervised learning approach. Our end-to-end integrated hybrid convolutional AE+LSTM feature extractor and K-Means clustering model achieves state-of-the-art clustering accuracy of up to 0.99 in terms of Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI) scores for MobiAct and UCI HAR datasets, improving clustering performance by over 50% compared to previous methods. Further improvements are achieved through rigorous experimentation and advanced data preprocessing methods. We also present a visualization of the clusters, which explains the transitional activity patterns in the overlapping parts of the clusters.