Synthetic data-augmented machine learning for 30-day readmission prediction in patients with chronic conditions: a retrospective real-world study

基于合成数据增强的机器学习方法预测慢性病患者30天再入院风险:一项回顾性真实世界研究

阅读:3

Abstract

OBJECTIVES: To develop and evaluate an explainable machine learning framework enhanced with synthetic data generation to predict unplanned 30-day hospital readmissions among patients with chronic obstructive pulmonary disease (COPD), heart failure (HF) and type 2 diabetes mellitus (T2DM), and to identify key clinical and social predictors of readmission. DESIGN: A retrospective cohort study using electronic health record data incorporating both structured variables and information extracted from unstructured clinical notes. Synthetic data were generated using advanced resampling and deep learning-based techniques to address outcome imbalance and improve model training. SETTING: Intensive care unit and general ward admissions at a single tertiary academic medical centre included in the MIMIC-IV (Medical Information Mart for Intensive Care IV) database. PARTICIPANTS: Adult patients (≥18 years) were admitted with a primary diagnosis of COPD (n=14 050), HF (n=7097) or T2DM (n=12 735) between 2008 and 2019, with complete 30-day follow-up and no in-hospital mortality during the index admission. PRIMARY AND SECONDARY OUTCOME MEASURES: The primary outcome was unplanned all-cause hospital readmission within 30-days of discharge. Predictors were drawn from six domains, including demographics, comorbidities, clinical acuity, therapies, behavioural factors and care continuity. Predictive performance was evaluated using multiple machine learning methods and fivefold cross-validation, with model interpretability assessed using established goal and local explanation approaches. RESULTS: Ensemble-based machine learning models demonstrated the strongest predictive performance across all three disease cohorts. Key predictors of readmission included higher illness severity, greater comorbidity burden, medication non-adherence, gaps in preventive care and limited social support. Models incorporating synthetic data augmentation showed improved discrimination compared with models trained on original data alone. CONCLUSIONS: An explainable synthetic-data driven framework incorporating clinical, behavioural and social data can support prediction of 30-day readmissions among patients with common chronic conditions using routinely available electronic health record data.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。