Abstract
Skeletal fluorosis (SF) is a chronic metabolic bone disease resulting from long-term excessive fluoride exposure, affecting millions worldwide. Conventional diagnosis relies on radiographic evidence, which often detects the disease only at advanced stages, limiting opportunities for early intervention and prevention. A predictive model was developed to assess the severity of SF using comprehensive predictors, including demographic, environmental, and biomonitoring data, from 1,309 individuals across three major fluoride-endemic regions in China, representing coal-burning, drinking-water, and brick-tea fluoride exposure. After variable selection using the least absolute shrinkage and selection operator (LASSO) regression, five machine learning algorithms were trained and validated. Model performance was primarily evaluated using the area under the receiver operating characteristic curve (AUC). SHapley Additive exPlanations (SHAP) were applied to enhance model interpretability. The Random Forest model achieved the best predictive performance (AUC = 0.875 in the training set; 0.832 in the test set). SHAP analysis identified pain score, joint function, age, and UF concentration as the most influential predictors of SF severity. The model also captured regional differences in exposure and severity patterns across the three fluoride sources. This interpretable machine learning framework provides a robust tool for early risk screening and severity stratification of SF in high-risk populations. By enabling timely identification of individuals at risk of progression, the model serves as a foundation for targeted public health interventions and highlights the utility of data-driven methods in large-scale environmental health surveillance.