Abstract
With the rapid advancements in intelligent transportation systems Jiang et al. (In: In Proceedings of the 30th ACM international conference on information & knowledge management 4515-4525, 2021), (Feng et al. Digit Commun Networks, 2024), precisely forecasting traffic information has emerged as a significant challenge. Recently, numerous advanced neural networks with complex architectures have been introduced to address this challenge. Nonetheless, the majority of these models handle temporal and spatial features separately before combining them, which overlooks the inherent relationships between these two types of characteristics. This approach of independent feature extraction can result in the loss of valuable information and restricts the model's capacity to effectively leverage the interdependencies between spatial and temporal features. In order to tackle this challenge, we introduce STIC, a Transformer-based neural network designed to capture crucial information from both spatial and temporal domains. The main innovation of our method lies in utilizing the cross-attention mechanism within Transformers to sequentially capture and adaptively merge spatiotemporal features from historical data. Experiments conducted on four diverse traffic forecasting datasets show that our model outperforms traditional methods by effectively uncovering the underlying spatial and temporal dependencies in traffic data sequences. Our work introduces a new strategy for enhancing the accuracy of traffic flow predictions.