Unlocking stroke prediction: Harnessing projection-based statistical feature extraction with ML algorithms

解锁中风预测:利用基于投影的统计特征提取和机器学习算法

阅读:1

Abstract

Non-communicable diseases, such as cardiovascular disease, cancer, chronic respiratory diseases, and diabetes, are responsible for approximately 71% of all deaths worldwide. Stroke, a cerebrovascular disorder, is one of the leading contributors to this burden among the top three causes of death. Early recognition of symptoms can encourage a balanced lifestyle and provide essential information for stroke prediction. To identify a stroke patient and risk factors, machine learning (ML) is a key tool for physicians. Due to different data measurement scales and their probability distributional assumptions, ML-based algorithms struggle to detect risk factors. Furthermore, when dealing with risk factors with high-dimensional features, learning algorithms struggle with complexity. In this study, rigorous statistical tests are used to identify risk factors, and PCA-FA (Integration of Principal Components and Factors) and FPCA (Factor Based PCA) approaches are proposed for projecting suitable feature representations for improving learning algorithm performances. The study dataset consists of different clinical, lifestyle, and genetic attributes, allowing for a comprehensive analysis of potential risk factors associated with stroke, which contains 5110 patient records. Using significant test (P-value <0.05), chi-square and independent sample t-test identified age, heart_disease, hypertension, work_type, ever_married, bmi, and smoking_status as risk factors for stroke. To develop the predicting model with proposed feature extraction techniques, random forests approach provides the best results when utilizing the PCA-FA method. The best accuracy rate for this approach is 92.55%, while the AUC score is 98.15%. The prediction accuracy has increased from 2.19% to 19.03% compared to the existing work. Additionally, the prediction results is robustified and reproducible with a stacking ensemble-based classification algorithm. We also developed a web-based application to help doctors diagnose stroke risk based on the findings of this study, which could be used as an additional tool to help doctors diagnose.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。