Abstract
With the rapid development of the internet, phishing attacks have become more diverse, making phishing website detection a key focus in cybersecurity. While machine learning and deep learning have led to various phishing URL detection methods, many remain incomplete, limiting accuracy. This paper proposes CSPPC-BiLSTM, a malicious URL detection model based on BiLSTM (Bidirectional Long Short-Term Memory, BiLSTM). The model processes URL character sequences through an embedding layer and captures contextual information via BiLSTM. By integrating CBAM (Convolutional Block Attention Module, CBAM), it applies channel and spatial attention to highlight key features and transforms URL sequence features into a spatial matrix. The SPP (Spatial Pyramid Pooling, SPP) module enables multi-scale pooling. Finally, a fully connected layer fuses features, and dropout regularization enhances robustness. Compared to CharBiLSTM, CSPPC-BiLSTM significantly improves detection accuracy. Evaluated on two datasets, Grambedding (balanced) and Mendeley AK Singh 2020 phish (imbalanced)-and compared with six baselines, it demonstrates strong generalization and accuracy. Ablation experiments confirm the critical role of CBAM and SPP in boosting performance.