Development of an explainable machine learning model for predicting poststroke anxiety: A multicenter study using Shapley Additive Explanations and nomogram visualization

开发一种可解释的机器学习模型来预测卒中后焦虑:一项基于 Shapley 加性解释和列线图可视化的多中心研究

阅读:1

Abstract

Objective: Neuropsychiatric complications following a stroke can impede recovery and reduce the quality of life. Current predictive methods for poststroke anxiety (PSA) are limited by inadequate feature selection and lack of interpretability. This study aimed to develop an interpretable machine learning model utilizing a wide range of clinical data to detect high-risk PSA patients early, enabling personalized interventions. Methods: This retrospective multicenter study included 238 stroke patients from 10 Chinese hospitals spanning from 1 January 2022 to 11 June 2025. Data encompassing demographic, clinical, biochemical, and psychosocial factors were gathered. Feature selection involved univariate analysis followed by least absolute shrinkage and selection operator (LASSO) regression. Seven machine learning models-logistic regression, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), random forest, decision tree, K-nearest neighbors, and stacking-were constructed and assessed using cross-validation. Feature importance was determined using SHAP (Shapley Additive Explanations), and a nomogram was developed based on the final model. Results: Among the 238 patients, 109 were diagnosed with PSA. In the test set, the logistic regression model exhibited the best performance, achieving an area under the curve (AUC) of 0.981, accuracy of 0.917, sensitivity of 0.867, specificity of 0.952, and an F1 score of 0.897. SHAP analysis identified recurrent stroke, income level, payment type, occupational stress, overwork, sleep quality, continuous drinking history, history of hypertension, diabetes, hyperlipidemia, hyperhomocysteinemia, white blood cell (WBC) count, total cholesterol (TC), low-density lipoprotein (LDL), fibrinogen (FIB), activated partial thromboplastin time (APTT), National Institutes of Health Stroke Scale (NIHSS) score, and Barthel index as crucial predictors. A nomogram incorporating the top 10 SHAP-ranked features was devised to assist in clinical decision-making. Conclusion: The machine learning model demonstrated high accuracy and interpretability in predicting PSA risk. Through the integration of SHAP analysis and nomogram visualization, it offers a practical tool for clinicians to recognize high-risk PSA patients and customize management strategies to improve poststroke outcomes.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。