Abstract
To address the challenges of accurate indoor positioning in complex environments, this paper proposes a two-stage indoor positioning method, ResT-IMU, which integrates the ResNet and Transformer architectures. The method initially processes the IMU data using Kalman filtering, followed by the application of windowing to the data. Residual networks are then employed to extract motion features by learning the residual mapping of the input data, which enhances the model's ability to capture motion changes and predict instantaneous velocity. Subsequently, the self-attention mechanism of the Transformer is utilized to capture the temporal features of the IMU data, thereby refining the estimation of movement direction in conjunction with the velocity predictions. Finally, a fully connected layer outputs the predicted velocity and direction, which are used to calculate the trajectory. During training, the RMSE loss is used to optimize velocity prediction, while the cosine similarity loss is employed for direction prediction. Theexperimental results demonstrate that ResT-IMU achieves velocity prediction errors of 0.0182 m/s on the iIMU-TD dataset and 0.014 m/s on the RoNIN dataset. Compared with the ResNet model, ResT-IMU achieves reductions of 0.19 m in ATE and 0.05 m in RTE on the RoNIN dataset. Compared with the IMUNet model, ResT-IMU achieves reductions of 0.61 m in ATE and 0.02 m in RTE on the iIMU-TD dataset and reductions of 0.32 m in ATE and 0.33 m in RTE on the RoNIN dataset. Compared with the ResMixer model, ResT-IMU achieves reductions of 0.13 m in ATE and 0.02 m in RTE on the RoNIN dataset. These improvements indicate that ResT-IMU offers superior accuracy and robustness in trajectory prediction.