Predicting cardiovascular disease among diabetic patients in Ethiopia using machine learning models: evidence from Ethiopian public health Institute data (2024/2025)

利用机器学习模型预测埃塞俄比亚糖尿病患者的心血管疾病:来自埃塞俄比亚公共卫生研究所数据的证据(2024/2025)

阅读:1

Abstract

INTRODUCTION: Cardiovascular disease (CVD) is the leading cause of death among individuals with diabetes, accounting for nearly 50% of diabetes-related mortality. In Ethiopia, the burden of diabetes is increasing, yet there is a lack of predictive tools for identifying those at highest risk of developing CVD. In Ethiopia recent studies report a CVD prevalence of 37.26% among diabetic patients. This study employed machine earning to predict CVD among Ethiopia diabetic patients using Ethiopian public Health Institute (EPHI) datasets, with a focus on identifying the most influential risk factors for public health decision-making. OBJECTIVE: The main objective of this study is to predict CVD among diabetic patients in Ethiopia using machine learning techniques. METHOD: The dataset comprised of 9030 instances with 22 features sourced from Ethiopian Public Health Institute. This prediction of cardiovascular disease (CVD) incorporated socio-demographic, behavioral, and clinical measurement data. Logistic regression, decision tree, Support Vector Machine, Random forest, Gradient boosting machine and artificial neural network were employed. Those models were trained on 80% of the data and tested on the remaining 20%. The analysis was conducted with python using 3.10. RESULTS: According to the results analyzed, Gradient Boosting Model (GBM) demonstrated the highest overall performance, achieving an accuracy of 93%, followed closely by Logistic Regression (LR) with 90% accuracy. In terms of precision, GBM and LR performed comparably, while the LR achieved the highest recall at 88%. Regarding the F1 score, GBM attained 82%, indicating a strong balance between precision and recall. Additionally, the receiver operating characteristics (ROC) analysis showed that GBM had the largest area under the curve (AUC) of 0.96, reflecting superior descriptive ability 0.96. CONCLUSION: The gradient boosting machine (GBM) model demonstrated the highest performance compared to the other models, achieving an accuracy of 93%. The most significant factors influencing the GBM model were total cholesterol, hypertension, and fasting blood glucose levels. The gradient boosting model shows potential for future integration into clinical decision-support systems, pending external validation and early prediction of cardiovascular disease in individuals with diabetes.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。