Abstract
BACKGROUND: This study aims to develop a predictive model for identifying rheumatoid arthritis (RA) patients at risk of low muscle mass using easily obtainable clinical indicators. The goal is to facilitate targeted screening for individuals at high risk of sarcopenia, optimize diagnostic strategies, reduce the burden of additional testing, and improve the efficiency of early identification and intervention. METHODS: This study analyzed data from 1,260 RA patients obtained from the National Health and Nutrition Examination Survey (NHANES) database and the Affiliated Hospital of Shandong University of Traditional Chinese Medicine (SHUTCM). Eight machine learning models were developed, including Random Forest, LightGBM, XGBoost, CatBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Logistic Regression, and a weighted ensemble model. Model performance was evaluated using metrics such as accuracy, area under the receiver operating characteristic curve (AUC), F1 score, Precision, Recall, and Brier score loss. The SHapley Additive exPlanation (SHAP) method was used to rank feature importance and interpret the final model. RESULTS: Among all machine learning models, the tree-based weighted ensemble model demonstrated the best performance, achieving an AUC of 0.921, outperforming all individual models. The model exhibited good calibration and higher net clinical benefit in decision curve analysis, especially within the probability threshold range of 0.2 to 0.8, and achieved an AUC of 0.848 on the test set, demonstrating a certain degree of generalizability. SHAP analysis identified BMI, albumin, hemoglobin, age, and creatinine as the most important features for predicting the risk of low muscle mass. SHAP dependency and waterfall plots further showed the model's decision-making mechanisms. Finally, we developed an online risk prediction calculator based on the FastAPI framework, which automatically generates individualized low muscle mass risk scores based on user input. The tool has been deployed on the Hugging Face platform and is accessible online. CONCLUSION: Based on a large, multicenter dataset, we developed and validated an explainable ML model capable of identifying individuals with a high risk of low muscle mass among patients with rheumatoid arthritis. This model may serve as a decision-support tool for clinicians in guiding further screening and diagnosis of sarcopenia.