Evaluating machine learning models for stroke prediction based on clinical variables

基于临床变量评估用于中风预测的机器学习模型

阅读:1

Abstract

INTRODUCTION: Stroke remains one of the leading causes of global mortality and long-term disability, driving the urgent need for accurate and early risk prediction tools. Traditional models such as the Framingham Stroke Risk Score have provided foundational insights into stroke prevention but are constrained by linear assumptions and limited adaptability to complex real-world data. In contrast, machine learning (ML) techniques offer the ability to model non-linear relationships and interactions among diverse clinical and demographic variables, supporting more personalized and flexible risk prediction. METHODS: This study evaluates five supervised ML algorithms, Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machine (SVM), and K-Nearest Neighbours (KNN), using a publicly available dataset from Kaggle. Following class imbalance correction, models were assessed using multiple metrics including accuracy, ROC-AUC, and confusion matrices. RESULTS: Logistic Regression and Gradient Boosting achieved the highest accuracy (95.11%) and ROC-AUC (0.836), although all models demonstrated poor recall, reflecting challenges in identifying rare stroke cases. Feature importance analysis using the Random Forest model identified age, average glucose level, and BMI as the most influential predictors of stroke, aligning with the Metabolic Syndrome Hypothesis and previous epidemiological findings. DISCUSSION: These findings underscore both the promise and current limitations of ML in stroke risk prediction and highlight the need for future research leveraging multi-modal datasets and advanced algorithmic strategies to enhance sensitivity and clinical utility.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。