SGO enhanced random forest and extreme gradient boosting framework for heart disease prediction

SGO增强型随机森林和极端梯度提升框架用于心脏病预测

阅读:1

Abstract

Cardiovascular disease (CVD) remains a leading global health concern, accounting for approximately 31.5% of deaths worldwide. According to the World Health Organization (WHO), over 20.5 million people succumb to CVD each year-a figure projected to rise to 24.2 million by 2030. Early diagnosis is critical and can be facilitated by monitoring key risk factors such as cholesterol levels, blood pressure, diabetes, and obesity. This study proposes a heart disease prediction (HDP) model employing Random Forest (RF) and eXtreme Gradient Boosting (XGB) classifiers. Both models are further optimized through hyperparameter tuning using the Social Group Optimization (SGO) algorithm. The model was developed and validated using the Cleveland and Statlog datasets from the UCI repository. Pre-optimization results for RF yielded an accuracy (Acc.) of 84% and a ROC-AUC score of 92.03% on the Cleveland dataset, and 88.09% Acc. with a ROC-AUC of 97.50% on Statlog. The XGB classifier achieved 81.97% Acc. and a ROC-AUC of 90.73% on Cleveland, and 92.86% Acc. with a ROC-AUC of 96.14% on Statlog. After SGO-based optimization, RF improved to 95.08% Acc. and 95.26% ROC-AUC on Cleveland, and 95.24% Acc. with 98.18% ROC-AUC on Statlog. Similarly, the optimized XGB classifier reached 93.44% Acc. and 95.24% ROC-AUC on Cleveland, and 97.62% Acc. with 97.50% ROC-AUC on Statlog. These results highlight the effectiveness of SGO in enhancing ML performance for medical prediction problems. However, the study has certain limitations. The evaluation was conducted solely on two benchmark datasets, which may not fully reflect the diversity and complexity of real-world clinical populations. Furthermore, external validation using independent or real-time clinical data was not performed, which may limit the generalizability of the results. The computational cost associated with SGO optimization was also not assessed. Future research should focus on validating the model across broader datasets, assessing real-world applicability, and analyzing computational efficiency to ensure scalability and clinical adoption.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。