Abstract
BACKGROUND: Malnutrition is a critical concern associated with increased mortality rates and adverse outcomes in stroke adults undergoing subacute rehabilitation. Despite its clinical significance, predictive tools for assessing malnutrition risk in this population remain limited. This study aimed to develop and validate an interpretable machine learning (ML) model to predict malnutrition risk among stroke patients during subacute rehabilitation. METHODS: This multicenter study comprised a development cohort of 802 patients from a single institution, which randomly split into training and testing sets at a 7:3 ratio. An external validation cohort of 345 patients was recruited from an independent hospital. Feature selection was conducted using the Least Absolute Shrinkage and Selection Operator (LASSO) regression combined with the Boruta algorithm. Eight ML models-Logistic Regression (LR), Random Forests (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Support Vector Machines (SVM), k-Nearest Neighbors (KNN), Neural Network (NNet), and CatBoost (CAT)-were trained utilizing five-fold cross-validation. These models were evaluated using metrics such as discrimination, calibration curve, and decision curve analysis (DCA). Model interpretability was assessed via Shapley Additive Explanations (SHAP) analysis. RESULTS: The CAT algorithm exhibited superior predictive model in the training and testing sets, achieving an area under the receiver operating characteristic curve (AUC) of 0.848 (95% CI: 0.817-0.879) and 0.806 (95% CI = 0.752-0.861), respectively. Calibration metrics underscored the model's robustness and DCA emphasized its clinical utility. External validation further corroborated the generalizability of the CAT model, demonstrating an AUC of 0.772; (95% CI: 0.723-0.820). SHAP analysis identified age, handgrip strength, and Barthel Index (BI) score as the most significant predictors of malnutrition. CONCLUSION: This study successfully developed and validated an ML model for efficiently screening malnutrition risk in patients with subacute stroke. The interpretable CAT-based model serves as a clinically actionable tool, enabling early stratification of malnutrition risk in subacute stroke patients. This facilitates the implementation of targeted nutritional interventions and personalized rehabilitation strategies, potentially improving outcomes in this vulnerable population.