Abstract
BACKGROUND: Osteoporosis represents a major health challenge in aging populations, yet its diagnosis largely depends on dual-energy X-ray absorptiometry (DXA), which is both costly and radiation-based. This study aimed to develop a practical, non-radiographic prediction model for osteoporosis using interpretable machine learning techniques and to implement it as an accessible online calculator for rapid clinical and community screening. METHODS: Data were derived from the 2008–2011 waves of the Korean National Health and Nutrition Examination Survey (KNHANES). Individuals with over 30% missing data were excluded, and incomplete variables were imputed via polynomial interpolation (for continuous variables) and mode imputation (for categorical variables). After performing Spearman correlation analysis (p < 0.001) to identify osteoporosis-related features, GradientBoost-RFE and LASSO regression were applied for dimensionality reduction, yielding 15 essential predictors, including age, sex, body mass index (BMI), etc. GradientBoost, CatBoost, and XGBoost algorithms were trained to estimate abnormal DXA results and classify bone status. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), specificity (SPE), and accuracy (ACC), with a temporal validation set (the 2008 wave of KNHANES) for testing. RESULTS: A total of 18,179 participants were included, with 14,747 in the development cohort and 3,432 in the temporal validation set. Among them, 64.6% exhibited normal DXA results. The optimal model achieved an AUC of 0.845 and SPE of 0.897 for identifying abnormal DXA outcomes, and demonstrated an AUC of 0.876 and SPE of 0.909 in temporal validation. For multiclass classification (normal, osteopenia, osteoporosis), the model reached ACC of 0.724 and 0.744, and SPE of 0.803 and 0.819 in the development and validation datasets, respectively. CONCLUSION: We developed and validated an interpretable machine learning model that accurately predicts osteoporosis risk and DXA abnormalities using readily available demographic, biochemical, and lifestyle data. To facilitate clinical translation, the model has been deployed as an interactive online calculator, enabling non-invasive, rapid osteoporosis risk assessment without radiological testing. This tool may support early identification of high-risk individuals, optimize DXA utilization, and enhance preventive care strategies across diverse healthcare settings. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-026-00520-w.