Abstract
BACKGROUND: Fatal and non-fatal drug overdoses have evolved into a critical public health crisis, with over a 50% increase in the rate of fatal drug overdose since 2019. Emergency Medical Services (EMS) data has advantages over traditional emergency department data, including timeliness and captured non-transport encounters. However, there is no consensus EMS definition for suspected opioid overdose (SOO), and currently implemented knowledge-based (KB) definition may miss ambiguous cases. Machine learning with natural language processing (ML-NLP) has the potential to enhance SOO identification. METHODS: Secondary data originated from an oversampled dataset of 2,327 weighted encounters from Kentucky State EMS data (2018-2022). EMS experts manually reviewed the records and determined ground truth SOO labels. We examined five commonly accepted KB definitions, ranging from narrow to highly inclusive criteria, spanning from structured-only data to combinations of structured and unstructured data. ML-NLP models were developed considering various EMS data fields and KB indicators. The models and KB definitions were evaluated using sensitivity, specificity, accuracy, precision, and F1-score. RESULTS: The ML-NLP models outperformed the KB definitions with the structured plus KB model achieving the highest F-score (0.81). Structured-only approaches demonstrated low sensitivity (0.30-0.45). The inclusion of patient care narratives and additional structured fields improved model performance with the ML-NLP models demonstrating high sensitivity (89.1%) and precision (89.0%). CONCLUSION: Integrated ML-NLP approaches offer significant improvements in opioid overdose surveillance compared to structured-only, unstructured-only, and KB-only approaches. Future research should explore the generalizability of these models across different populations and geographic areas.