Abstract
Traffic flow prediction is a critical component of Intelligent Transportation Systems (ITS), playing a vital role in improving travel efficiency and enhancing road safety. However, achieving accurate prediction remains challenging due to complex spatial-temporal dependencies, particularly spatial heterogeneity and multi-scale temporal patterns. To address these challenges, we propose a novel framework called Multi-Scale Spatial-Temporal Transformer (MSSTFormer). It integrates a two-stage spatial attention module to extract spatial features, which handles the complex relationships between global and local dependencies and enhances interactions among strongly correlated key nodes. In this way, the model’s ability to capture critical spatial relationships is significantly improved. Additionally, the frequency dual-channel attention module in the model, a novel module, can enhance the model’s ability to capture complex temporal dynamics by decomposing multi-scale temporal features and separately modeling long-term trends and short-term fluctuations through the interplay between high-frequency and low-frequency components. Furthermore, the model incorporates a gated mechanism in the data embedding layer to eliminate redundant information, thereby optimizing input data quality. Experimental results on four public traffic datasets demonstrate that MSSTFormer outperforms existing models on most datasets, showing improvements in prediction accuracy. Moreover, we enhance model interpretability by visualizing the learned frequency dual-channel attention weights. Our code is available at https://github.com/whaaaa123/MSSTFormer.