Abstract
INTRODUCTION: Early prediction of stroke outcomes using prognostic tools may help clinical decision-making and inform resource allocation. However, clinical information required to inform prediction tools is often missing. We evaluated the performance of machine learning (ML) prediction models of adverse stroke outcome at 90 days post-admission that exploit non-clinical data, and missingness, alongside traditional clinical and demographic predictors. METHODS: We used routine hospital data from UK clinical sites (NHS SafeHaven) to train three gradient-boosted models. We compared baseline clinical features with nonclinical features and missingness to predict a composite 90-day adverse stroke outcome: mortality, stroke recurrence, or new care-home discharge. Model validation used 10% of the data. Model performance was evaluated by accuracy (correct predictions/total predictions) and area under the receiver operating characteristics curve (AUC) while DeLong's test was used to compare performance of the three models. We used Brier score to evaluate model calibration. SHapley Additive exPlanations (SHAP) analyses determined the contribution of each model feature in predicting adverse stroke outcome. RESULTS: The final sample included 3,530 stroke patients with 51% males (mean age = 72 years; SD = 14). Clinical data were incomplete with five clinical features having >63% missing values. The performance of the three models was not significantly different (p = 0.5-0.9). The model with non-clinical and missingness features demonstrated 71% accuracy and AUC of 0.76 with Brier score of 0.19. Nonclinical factors, such as time to clinical assessment and time to admission, were among the five most important predictors of adverse stroke outcome (mean |SHAP| = 0.03 and 0.05), alongside Glasgow Coma Scale (0.08), age (0.03), and temperature (0.02). Missing clinical values (pulse and LDL) predicted adverse stroke outcome (mean |SHAP| = 0.02 and 0.02) and were correlated with age (ρ = 0.2), arrival by ambulance (ρ = 0.3), length of stay (ρ = -0.3), and transient ischaemic attack (ρ = 0.3). CONCLUSION: We demonstrate that nonclinical factors and missingness of data can assist in early predictions of 90-day adverse stroke outcomes. As these factors are often well documented in electronic health systems, they could complement or supplement traditional clinical predictive factors.