The XGBoost Model Versus the Logistic Regression Model Created Based on Serum Markers in Predicting the Risk of Post-Stroke Cognitive Impairment Following Acute Ischemic Stroke

基于血清标志物的XGBoost模型与逻辑回归模型在预测急性缺血性卒中后认知障碍风险方面的比较

阅读:3

Abstract

BACKGROUND: Acute ischemic stroke is a major cause of cognitive dysfunction. Early identification of post-stroke cognitive impairment (PSCI) is crucial for improving patient prognosis. While there has been extensive research on prognostic models for acute ischemic stroke, the selection of predictive factors remains heavily reliant on neuroimaging parameters. This study aims to create and compare the eXtreme gradient boosting (XGBoost) and logistic regression (LR) models based on serum biomarkers for predicting the risk of PSCI following acute ischemic stroke. METHODS: The study enrolled 261 adult patients with acute ischemic stroke within 7 days of onset. Their baseline characteristics, serum markers, and scores anthe National Institutes of Health Stroke Scale (NIHSS) and the Montreal Cognitive Assessment (MoCA) were collected. Cognitive function assessment was completed 3 months (±2 weeks) after stroke, with PSCI diagnosis based on a MoCA score < 26. Patients were randomly assigned to the training dataset (n = 183) and testing dataset (n = 78) in a ratio of 7:3. Significant features for predicting the risk of PSCI were selected via LassoCV in R. The accuracy, F1 score, Cohen's kappa, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were measured to assess the accuracy of the XGBoost and LR prediction models. Finally, the performance of the optimal prediction model was evaluated by SHapley additive exPlanations (SHAP) beeswarm and force plots. RESULTS: The incidence of PSCI and other baseline characteristics were comparable between the training and testing datasets (all P > 0.05). Vascular endothelial cadherin (VE-Cad), NIHSS score, age, drink history, C-reactive protein (CRP), and education years were features associated with the risk of PSCI. The XGBoost model was superior in accuracy, F1 score and sensitivity in predicting the risk of PSCI than the LR model. Beeswarm and force plots displayed the excellent ability of the XGBoost model in predicting the risk of PSCI in patients with acute ischemic stroke. CONCLUSION: Based on serum biomarkers, the XGBoost model can accurately predict the risk of PSCI in patients with acute ischemic stroke, with superior performance than the LR model, and may serve as a reliable tool for early identification to improve the diagnosis.From 261 acute ischemic stroke patients (training n = 183, testing n = 78), we collected demographic data, cognitive assessments, and serum indicators. LassoCV identified sensitive predictors including VE-Cad, NIHSS score, CRP, age, drinking history, and education years. The XGBoost model demonstrated superior performance over LR in predicting PSCI risk. SHAP analysis revealed how these variables influenced model predictions. Based on serum biomarkers, the XGBoost model accurately predicts PSCI risk and may serve as a reliable tool for early identification to improve diagnosis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。