Abstract
Objective: Neuropsychiatric complications following a stroke can impede recovery and reduce the quality of life. Current predictive methods for poststroke anxiety (PSA) are limited by inadequate feature selection and lack of interpretability. This study aimed to develop an interpretable machine learning model utilizing a wide range of clinical data to detect high-risk PSA patients early, enabling personalized interventions. Methods: This retrospective multicenter study included 238 stroke patients from 10 Chinese hospitals spanning from 1 January 2022 to 11 June 2025. Data encompassing demographic, clinical, biochemical, and psychosocial factors were gathered. Feature selection involved univariate analysis followed by least absolute shrinkage and selection operator (LASSO) regression. Seven machine learning models-logistic regression, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), random forest, decision tree, K-nearest neighbors, and stacking-were constructed and assessed using cross-validation. Feature importance was determined using SHAP (Shapley Additive Explanations), and a nomogram was developed based on the final model. Results: Among the 238 patients, 109 were diagnosed with PSA. In the test set, the logistic regression model exhibited the best performance, achieving an area under the curve (AUC) of 0.981, accuracy of 0.917, sensitivity of 0.867, specificity of 0.952, and an F1 score of 0.897. SHAP analysis identified recurrent stroke, income level, payment type, occupational stress, overwork, sleep quality, continuous drinking history, history of hypertension, diabetes, hyperlipidemia, hyperhomocysteinemia, white blood cell (WBC) count, total cholesterol (TC), low-density lipoprotein (LDL), fibrinogen (FIB), activated partial thromboplastin time (APTT), National Institutes of Health Stroke Scale (NIHSS) score, and Barthel index as crucial predictors. A nomogram incorporating the top 10 SHAP-ranked features was devised to assist in clinical decision-making. Conclusion: The machine learning model demonstrated high accuracy and interpretability in predicting PSA risk. Through the integration of SHAP analysis and nomogram visualization, it offers a practical tool for clinicians to recognize high-risk PSA patients and customize management strategies to improve poststroke outcomes.