Explainable machine learning model for 1-year readmission risk prediction in AECOPD patients: integrating relief feature selection with sample augmentation

基于可解释机器学习模型的AECOPD患者1年再入院风险预测:结合缓解特征选择和样本增强

阅读:1

Abstract

BACKGROUND: Patients with acute exacerbations of chronic obstructive pulmonary disease (AECOPD) face a high risk of readmission following discharge. Accurate identification of high-risk individuals is crucial for optimising clinical management. However, clinical prediction models frequently encounter challenges such as limited sample sizes, data missingness, and category imbalance, which compromise their generalisability and clinical utility. METHODS: This retrospective study included patients first hospitalised for AECOPD at a tertiary hospital between December 2018 and July 2023. The primary outcome was unplanned all-cause readmission within one year post-discharge. Missing data were addressed using Multiple Imputation by Chained Equations (MICE). To enhance model robustness, conditional generative adversarial networks (CTGAN) were applied to 80% of the derivation cohort for data augmentation (generating 150% of the original sample size). Logistic regression, decision trees, random forests, XGBoost, and LightGBM models were constructed on the augmented data. Hyperparameters were optimised using grid search and 5-fold cross-validation, with performance evaluated on the reserved 20% test set. The predictive mechanisms of the optimal model were interpreted using the SHAP framework. RESULTS: A total of 1,960 patients were included, of whom 783 (39.9%) experienced readmission. Data augmentation effectively mitigated overfitting and significantly improved model generalisation on the test set. The XGBoost model demonstrated optimal performance, achieving an AUC of 0.696 on the test set alongside favourable calibration and clinical net benefit. SHAP analysis revealed that eosinophil count (EOS, negatively correlated), ICU admission status (positively correlated), red cell distribution width (RDW-SD, positively correlated), Prognostic Nutritional Index (PNI, negatively correlated), and platelet-lymphocyte ratio (PLR, positively correlated) were the most critical features driving model predictions. CONCLUSION: This study successfully developed and validated a readmission risk prediction model for AECOPD patients based on routine clinical variables. The integration of CTGAN data augmentation strategies effectively enhanced model performance. The optimal XGBoost model not only demonstrated strong discriminative capability but also exhibited interpretable predictive logic consistent with clinical pathophysiological mechanisms, as revealed by SHAP analysis. This model holds potential for clinical translation, aiding in the identification of high-risk individuals for readmission and enabling early intervention. CLINICAL TRIAL: Not applicable.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。