Abstract
This study aims to develop a prediction model based on machine learning algorithms to predict the risk of stroke by analyzing physical activity and other risk factors. We conducted a retrospective analysis of 134 stroke patients treated at the Department of Neurology, the First Affiliated Hospital of Nanjing Medical University, from July 1, 2021, to May 31, 2023, and 354 non-stroke individuals recruited from the Fenghuang Community Health Screening Program in Nanjing during the same period. Eight machine learning models, including extreme gradient boosting, support vector machine, random forest (RF), neural network, Naive Bayesian, logistic regression, K-nearest neighbor, and decision tree, were used to build the prediction models. Variables were selected using the least absolute shrinkage and selection operator and the multivariable logistic regression analysis. The models were evaluated using receiver operating characteristic (ROC) curves, area under the ROC curve, precision-recall (PR) curves, area under the PR curve, accuracy, sensitivity, specificity, and precision. Shapley Additive Explanations were employed to determine feature importance. The results demonstrated that the RF algorithm performed well in terms of area under the ROC curve (0.96), area under the PR curve (0.92), specificity (0.97), and precision (0.92). Shapley Additive Explanations analysis revealed that the number of weekly exercise days had the most significant impact on stroke risk, followed by calf circumference, past medical history, gender, body mass index, and Strength, Assistance with walking, Rising from a chair, Climbing stairs, and Falls score. The RF algorithm demonstrated strong predictive performance for stroke risk and may guide clinical decision-making.