Cardiovascular risk prediction via ensemble machine learning and oversampling methods

基于集成机器学习和过采样方法的心血管风险预测

阅读:1

Abstract

Cardiovascular diseases are a leading cause of global mortality, with hypertension, obesity, and other factors contributing significantly to risk. Artificial Intelligence has emerged as a valuable tool for early detection, offering predictive models that outperform traditional methods. This study analyzed a dataset of 709 individuals from Ecuador, including demographic and clinical variables, to estimate cardiovascular risk. During preprocessing, records with missing values and duplicates were removed, and highly correlated variables were excluded to reduce multicollinearity and prevent overfitting. The performance of several machine learning algorithms–including Decision Trees, Random Forest, Gradient Boosting, Extreme Gradient Boosting, LightGBM, Extra Trees, AdaBoost, and Bagging–was compared, while addressing class imbalance using SMOTE and a hybrid ROS–SMOTE approach. Gradient Boosting with the hybrid technique achieved the best performance, obtaining an accuracy of 0.87, a precision of 0.81, a recall of 0.74, and an F1-score of 0.75. Its superior performance is attributed to its sequential error correction mechanism and integrated regularization strategies, which effectively reduce overfitting and improve generalization in noisy or imbalanced datasets. These findings demonstrate the potential of AI-based models to improve early detection and management of cardiovascular disease, highlighting the importance of anthropometric, clinical, and blood pressure variables in predicting cardiovascular risk.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。