Abstract
End-stage renal disease (ESRD) is associated with high morbidity and mortality. Identifying patients with stage 4 chronic kidney disease (CKD) at risk of short-term progression to ESRD remains challenging. Accurate prediction can improve advanced care planning and patient outcomes. This study aimed to develop and validate a machine learning (ML) model for predicting progression within 25 weeks (approximately six months) of ESRD in patients with stage 4 CKD. Electronic health records (EHRs) of patients with stage 4 CKD were analyzed. Nine ML models including Ridge regression (Ridge), random forest (RF), and eXtreme Gradient Boosting (XGBoost) were used to predict short-term progression to ESRD within 25 weeks. The models were trained and externally validated using the data of 346 and 105 patients. Of the 451 patients with stage 4 CKD, 219 developed ESRD. Among the evaluated models, XGBoost demonstrated the best overall performance. In the internal validation, it achieved an area under the curve (AUC) of 0.93, an accuracy of 0.90, and an F1 score of 0.89. In the external validation, XGBoost maintained the highest AUC (0.85), accuracy (0.79), and F1 score (0.79), along with the highest average precision (0.89) and a low log-loss (0.48), indicating strong discriminative ability and good generalizability. The top predictive features included high-density lipoprotein cholesterol, Alb, Cys C, ApoB, FGB, Bun, Neutrophil, and Total cholesterol. This study demonstrated the feasibility of ML for assessing ESRD prognosis based on easily accessible clinical features. XGBoost demonstrated superior performance in both internal and external validation, suggesting its potential for future patient screening.