A holistic framework for intradialytic hypotension prediction using generative adversarial networks-based data balancing

基于生成对抗网络的数据平衡方法,构建透析中低血压预测的整体框架

阅读:1

Abstract

BACKGROUND: Intradialytic Hypotension (IDH) is a frequent complication in hemodialysis, yet predictive modeling is challenged by class imbalance. Traditional oversampling methods often struggle with complex clinical data. This study evaluates an enhanced conditional Wasserstein Generative Adversarial Network with Gradient Penalty (CWGAN-GP) framework to improve IDH prediction by generating high-utility synthetic data for balancing. METHODS: A CWGAN-GP was developed using multi-level hemodialysis data. Following rigorous preprocessing, including a strict temporal train-test split, the CWGAN-GP generated minority class samples exclusively on the training data. eXtreme Gradient Boosting (XGBoost) models were trained on the original imbalanced data and datasets balanced using the proposed CWGAN-GP method, benchmarked against traditional Synthetic Minority Over-sampling Technique(SMOTE) and Adaptive Synthetic Sampling Approach(ADASYN) balancing. Performance was evaluated using metrics sensitive to imbalance (e.g., Precision-Recall Area Under the Curve) and statistical comparisons, with SHapley Additive exPlanations (SHAP) analysis for interpretability. RESULTS: The study population consisted of 40 chronic hemodialysis patients (45% male, mean age 66.30[Formula: see text] 10.68 years). An initial dataset, where intradialytic hypotension (IDH) events occurred in 14.85% of records (19,124 instances overall), was temporally split (75:25 ratio). This yielded an Original Training dataset of 95,856 samples (14.73% IDH rate) and a test set (15.21% IDH rate). From this Original Training dataset, a Generative Adversarial Network (GAN) was employed to construct a balanced dataset comprising 163,470 samples. The GAN Balanced dataset yielded the highest predictive performance, demonstrating statistically significant improvements over the Original Training dataset across metrics, including Precision-Recall Area Under the Curve (PR-AUC) (mean 0.735 vs 0.724) and Accuracy (mean 0.900 vs 0.892). In contrast, the GAN Augmented dataset (191,712 samples) showed mixed results (improved Accuracy/F1, decreased Receiver Operating Characteristic Curve Area Under Curve (ROC-AUC)/PR-AUC). In comparison, ADASYN (163,326 samples) and SMOTE (163,470 samples) balanced datasets significantly underperformed on PR-AUC. SHAP analysis identified Dialysis Date (as a proxy for temporal patterns like day-of-week) and hemodynamic indicators (e.g., Systolic Diastolic Difference, Previous Systolic Pressure) as key IDH predictors. CONCLUSION: The proposed CWGAN-GP framework effectively balances complex hemodialysis data, leading to significantly improved and interpretable IDH prediction models compared to standard approaches. This work supports leveraging advanced generative models like GAN to overcome data imbalance in clinical prediction tasks, which is pending further validation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。