Abstract
INTRODUCTION: Prediction of disease progression at all stages of chronic kidney disease (CKD) may help improve patient outcomes. As such, we aimed to develop and externally validate a random forest model to predict progression of CKD using demographics and laboratory data. METHODS: The model was developed in a population-based cohort from Manitoba, Canada, between April 1, 2006, and December 31, 2016, with external validation in Alberta, Canada. A total of 77,196 individuals with an estimated glomerular filtration rate (eGFR) > 10 ml/min per 1.73 m(2) and a urine albumin-to-creatinine ratio (ACR) available were included from Manitoba and 107,097 from Alberta. We considered >80 laboratory features, including analytes from complete blood cell counts, chemistry panels, liver enzymes, urine analysis, and quantification of urine albumin and protein. The primary outcome in our study was a 40% decline in eGFR or kidney failure. We assessed model discrimination using the area under the receiver operating characteristic curve (AUC) and calibration using plots of observed and predicted risks. RESULTS: The final model achieved an AUC of 0.88 (95% CI 0.87-0.89) at 2 years and 0.84 (0.83-0.85) at 5 years in internal testing. Discrimination and calibration were preserved in the external validation data set with AUC scores of 0.87 (0.86-0.88) at 2 years and 0.84 (0.84-0.86) at 5 years. The top 30% of individuals predicted as high risk and intermediate risk represent 87% of CKD progression events in 2 years and 77% of progression events in 5 years. CONCLUSION: A machine learning model that leverages routinely collected laboratory data can predict eGFR decline or kidney failure with accuracy.