Abstract
INTRODUCTION: Acute dizziness accounts for approximately 4% of emergency department (ED) visits, with stroke often missed. Current methods for stroke detection in dizzy patients have notable limitations, with vestibular strokes missed in a substantial proportion of ED visits. This study aimed to develop a machine learning (ML) tool to assess stroke risk in patients with acute dizziness. METHODS: We developed an ensemble model combining four ML algorithms using structured electronic medical record data and unstructured ED physician notes. Model performance was evaluated on a holdout test set and compared with the ABCD(2) score using area under the receiver operating characteristic curve (AUC), net reclassification improvement (NRI), integrated discrimination improvement (IDI), and decision curve analysis. RESULTS: The ensemble model achieved the highest AUC at 0.880, significantly outperforming the ABCD(2) score (AUC 0.673) and individual ML models. The ensemble model demonstrated superior calibration with the lowest Brier score and showed greater clinical utility across different risk thresholds. Features extracted from unstructured clinical text substantially enhanced model performance, with models combining structured and unstructured data consistently outperforming those trained on structured data alone. CONCLUSIONS: Our ensemble prediction model effectively stratifies stroke risk in ED patients with acute dizziness. By integrating natural language processing of clinical notes with structured patient data, the model offers a more accurate risk assessment than traditional methods. The implementation of this tool could improve patient outcomes by directing advanced neuroimaging to high-risk patients while avoiding unnecessary testing in low-risk patients, ultimately enhancing patient safety and optimizing resource utilization.