Abstract
Objectives: This study aimed to develop and validate an explainable machine learning (ML) model to predict the risk of sarcopenia in older European adults with arthritis, providing a practical tool for early and precise screening in clinical settings. Methods: We analyzed data from the English Longitudinal Study of Aging (ELSA) and the Survey of Health, Aging and Retirement in Europe (SHARE). The final analysis included 1959 participants aged ≥65 years. The ELSA dataset was divided into a training set (n = 1371) and an internal validation set (n = 588), while the SHARE dataset (n = 1001) served as an independent external test cohort. From an initial pool of 33 variables, nine core predictors were identified using ensemble feature selection techniques. Six ML algorithms were compared, with model performance evaluated using the Area Under the Curve (AUC) and calibration analysis. Model interpretability was enhanced via SHapley Additive exPlanations (SHAP). Results: The Decision Tree model demonstrated the optimal balance between performance and interpretability. It achieved an AUC of 0.921 (95% CI: 0.848-0.988) in the internal validation set and maintained robust generalizability in the external SHARE cohort with an AUC of 0.958 (95% CI: 0.931-0.985). The nine key predictors identified were stroke history, BMI, HDL, loneliness, walking speed, disease duration, age, recall summary score, and total cholesterol. SHAP analysis visualized the specific contribution of these features to individual risk. Conclusions: This study successfully developed a high-performance, explainable, lightweight ML model for sarcopenia risk prediction. By inputting only nine readily available clinical indicators via an online tool, individualized risk assessment can be generated. This facilitates early identification and risk stratification of sarcopenia in older European arthritis patients, thereby providing valuable decision support for implementing precision interventions.