An integrated approach of feature selection and machine learning for early detection of breast cancer

一种结合特征选择和机器学习的乳腺癌早期检测综合方法

阅读:2

Abstract

Breast cancer ranks among the most prevalent cancers in women globally, with its treatment efficacy heavily reliant on the early identification and diagnosis of the disease. The importance of early detection and diagnosis cannot be overstated in enhancing the survival prospects of those afflicted with breast cancer. With the increasing application of machine learning technology in the medical field, algorithm-based diagnostic tools provide new possibilities for early prediction of breast cancer. In this study, we introduced a novel feature selection approach, which leverages Shapley additive explanation (SHAP) values as the basis for Recursive Feature Elimination (RFE), utilizing a Random Forest (RF) algorithm within the RFE framework. To address the data imbalance challenge, we incorporated Borderline-SMOTE1. The efficacy of the proposed method was assessed using five machine learning models, K-Nearest Neighbor (KNN), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Light Gradient Boosting Machine (LightGBM), applied to the Wisconsin Breast Cancer Diagnosis (WBCD) datasets. Optimizing hyperparameters of five models using the Particle Swarm Optimization (PSO) algorithm. In the datasets, 26 features were filtered using our recommended algorithm, the LightGBM-PSO model demonstrated an outstanding performance. The model demonstrated an impressive accuracy of 99.0% in differentiating between benign and malignant cases, boasting a specificity and precision of 100%, a recall rate of 97.40%, an F-measure of 98.68%, an AUC of 0.9870, and a 10-fold cross-validation accuracy of 0.9808. Subsequently, we developed a corresponding online tool (https://breast-cancer-prediction-tool-cgbjlhkns7yig6bmzvztmc.streamlit.app/) based on this model for predicting the risk of breast cancer. Feature selection using recommended algorithm and optimization of the LightGBM model through PSO can significantly enhance the accuracy of breast cancer prediction. This could potentially improve the prognosis for patients diagnosed with breast cancer.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。