Abstract
OBJECTIVE: To develop and validate machine learning models to predict osteoporosis risk in chronic kidney disease (CKD) patients. METHODS: Data from the National Health and Nutrition Examination Survey (2005-2010, 2013-2014, 2017-2018) included 50,463 participants. Separate models for male and female CKD patients were developed using 59 potential predictors, with key variables selected through the Least Absolute Shrinkage and Selection Operator and Boruta algorithms. Seven single-base models, including logistic regression, support vector machine, extreme gradient boosting, K-nearest neighbors, gradient boosting decision tree, random forest (RF), and neural network, were trained. Additionally, stacking ensemble models were constructed. Model performance was evaluated using receiver operating characteristic curves, F1 scores, Matthews correlation coefficient, and Brier scores. The best models were externally validated and visualized for interpretability. RESULTS: Among 3796 CKD patients, osteoporosis prevalence was 12.54% (7.28% in males and 17.57% in females). RF models demonstrated superior performance in each gender. The male RF model achieved an area under the curve of 0.845 in testing set and 0.728 in the external validation set, while the female RF model achieved and 0.859 and 0.812, respectively. Shapley additive explanations values summary plots showed that the top five important features for the male RF model were weight, age, height, Non-Hispanic Black ethnicity, and estimated glomerular filtration rate. For the female RF model, the top five important features were weight, use of female hormone medications, age, Non-Hispanic Black ethnicity, and red blood cell. Online calculators were constructed to facilitate the clinical practical application. CONCLUSIONS: The RF model in female CKD patients demonstrated strong predictive performance for osteoporosis, while models for males were less effective.