Abstract
INTRODUCTION: Post-stroke cognitive impairment (PSCI) is a prevalent and disabling consequence of stroke, yet objective tools for its early identification are lacking. This study aimed to develop and validate an interpretable machine learning (ML) model based on electroencephalography (EEG) to support the early detection of PSCI. METHODS: We conducted a study involving 174 participants, including stroke patients with and without cognitive impairment and age-matched healthy controls. Resting-state EEG was acquired from all subjects, and multidimensional features, including power spectral ratios and microstate parameters, were extracted. Feature selection was performed using LASSO regression, random forest, and the Boruta algorithm. Five machine learning models were evaluated and compared based on their area under the curve (AUC), accuracy, Brier score, calibration plots, and decision curve analysis. Model interpretability was explained using SHAP (Shapley Additive Explanations). The final validated model was deployed as an interactive web-based application. RESULTS: Seven EEG features were identified as most predictive of PSCI: the delta-plus-theta to alpha-plus-beta ratio (DTABR) in frontal, central, and global regions; the mean microstate duration of classes A and B (A-MMD, B-MMD); the mean frequency of microstate D (D-MFO); and the mean coverage of microstate A (A-MC). The random forest model demonstrated the highest performance (AUC = 0.91, accuracy = 0.83, specificity = 0.88, Brier score = 0.12), alongside satisfactory calibration and a positive net clinical benefit. The model was further validated on an independent external cohort (n = 42), showing robust predictive performance (AUC = 0.97, accuracy = 0.90). An accessible web tool was created for individualized risk prediction (https://eeg-predict.streamlit.app/). DISCUSSION: The findings suggest that an interpretable EEG-based ML model can provide accurate early screening of PSCI. Integration of this approach into clinical workflows may support personalized rehabilitation strategies and optimize post-stroke care. Future studies are warranted to validate the model in larger, multicenter cohorts.