Abstract
With the increasing complexity of the Internet of Vehicles (IoV) architecture and the continuous evolution of attack techniques, in-vehicle networks are confronted with unprecedented security challenges, while existing intrusion detection systems (IDSs) still exhibit multiple limitations in IoV scenarios. First, traditional IDSs often neglect potential spatial-temporal dependencies in network traffic, leading to insufficient modeling capability for sophisticated attack behaviors. Second, there remains a lack of hybrid IDS capable of simultaneously addressing both intra-vehicle and external network attacks, as their detection capabilities are typically confined to a single environment or attack type. This paper proposes GCN-2-Former, an innovative spatial-temporal model that utilizes a Graph Convolutional Network (GCN) and a transformer. The model employs a sliding window mechanism and dynamic graph construction strategy to map heterogeneous network traffic into spatial-temporal graph structures. Local spatial features are extracted via GCN, while multi-layer Transformer modules are introduced to model global temporal dependencies. Furthermore, a graph-level feature fusion strategy is adopted to effectively integrate spatial and temporal characteristics. Experimental results indicate that the proposed model achieves an accuracy and F1-score of 99.98% on the CICIDS2017 dataset, which represents external network attacks, and a detection rate of 100% on the Car Hacking dataset, which represents intra-vehicle network attacks. It significantly outperforms existing mainstream methods, demonstrating excellent detection capability, robustness, and cross-domain generalization performance.