Abstract
Phishing is a social engineering attack and a type of cybercrime that is dangerously and constantly on the rise. Phishing attacks can impact various sectors, including governmental, social, financial, and individual businesses. Traditional methods of identifying phishing websites, such as blacklist and heuristic approaches, often fail to provide sufficient protection. Moreover, traditional techniques that combine URLs, webpage content, and external features are time-consuming, require substantial computing power, and are unsuitable for devices with limited resources. Moreover, previous research has often overlooked the critical role of identifying which features are important for detection and their impact on outcomes. Traditional methods might not fully capture the significance of individual features. To overcome this issue, this research applies feature selection techniques, specifically shapley additive explanations, with each model based primarily on the URL to improve the detection process. A dataset with over 11000+ URLs and 30 varied features of the "Phishing Website Detection" was applied from the Kaggle repository. Then, the models, namely support vector machine (SVM), random forest (RF), decision tree (DT), logistic regression(LR), and K-nearest neighbor, were trained and tested. Each model used shapely additive explanations (SHAP) to improve precision and interpretability by highlighting the most important features. It was tested using some key performance metrics such as accuracy, precision, recall, and F1 score. Compared to all the models that were tested, this random forest model indicates 97% accuracy. The proposed system offers an overall and interpretable solution for phishing detection that contributes to a safer digital environment.