Abstract
Machine learning (ML) risk prediction models for post-stroke cognitive impairment (PSCI) are still far from optimal. This study aims to generate a reliable predictive model for predicting PSCI in Chinese individuals using ML algorithms. We collected data on 494 individuals who were diagnosed with acute ischemic stroke (AIS) and hospitalized for this condition from January 2022 to November 2023 at a Chinese medical institution. We assessed cognitive function of patients recently diagnosed with a stroke (in the preceding 3-6 months), PSCI was determined from MMSE or MOCA scores. All of the observed samples were divided into a training set (70%) and a validation set (30%) at random. The least absolute shrinkage and selection operator (LASSO) penalty and logistic regression (LR) can help filter the best predictive features for PSCI from 49 common clinical parameters collected on admission. We utilized seven different ML models, including LR, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), Gaussian naive bayes (GNB), Multilayer Perceptron (MLP), Support Vector Machine (SVM), and compared their performance for the resulting variables. We used tenfold cross-validation to measure the model's AUROC (Area under the receiver operating characteristic), sensitivity, specificity, accuracy, F1 score and AP (Average precision) values. SHAP (Shapley additive exPlanations) analysis provides a comprehensive and detailed explanation of our optimized model's performance. PSCI was identified in 58.50% of the 494 eligible AIS patients. Age, National institutes of health stroke scale (NIHSS), Hamilton depression scale (HAMD)-24, Pittsburgh sleep quality index (PSQI), ALB, FBG, hypertension, paraventricular lesion, and number of lesions were significant influencing features of PSCI. The AUROC of the XGBoost model is 0.980, which is better than the prediction performance of the other models (LR: 0.808, LightGBM: 0.800, AdaBoost: 0.893, GNB: 0.789, MLP: 0.745, and SVM: 0.868). The XGBoost model, leveraging predictors including age, NIHSS, HAMD-24, PSQI, ALB, FBG, hypertension, paraventricular lesion, and number of lesions, effectively predicts mild to moderate cognitive impairment 3-6 months post-stroke. This tool enables early identification of at-risk patients, facilitating timely clinical interventions.