Abstract
BACKGROUND: Post-Stroke Dysphagia (PSD), as a common complication of cerebrovascular accidents, seriously affects patients’ quality of life and prognosis. This retrospective study aims to provide a reliable machine learning model for predicting the prognostic factors of PSD. METHODS: The clinical data from patients admitted to the Fourth Affiliated Hospital of Soochow University from January 2021 to December 2023 were collected. The dataset was chronologically split into a training set (January 2021-May 2023, n = 377) and a temporal validation set (June-December 2023, n = 91). Feature variables selection was performed using correlation analysis and logistic regression. Five machine learning models were developed and evaluated and subsequently evaluated on the temporal validation set using precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). RESULTS: Eight feature variables were selected for model construction. On the temporal validation set, the Gradient Boosting Decision Tree (GBDT) demonstrated the best performance with a precision of 0.937, recall of 0.795, F1-score of 0.863, and an area under the receiver operating characteristic curve (AUC) of 0.940. During 5-fold cross-validation on the training set, Random Forest (RF) achieved the highest average AUC (0.949), but its performance decreased to 0.890 on the temporal validation set. The superior performance of GBDT on the temporal validation set indicates its stronger generalization capability compared to other models. CONCLUSIONS: The GBDT model showed robust performance on the temporal validation set, suggesting its potential clinical utility for predicting swallowing function recovery in post-stroke patients.