Abstract
BACKGROUND: Interictal epileptiform discharges (IEDs) are transient spikes or waves that occur in electroencephalography (EEG) records and can help support the diagnosis and classification of epilepsy. High-throughput machine learning models aim to automate the detection of IEDs. Previous evaluations of machine learning models have reported non-inferiority compared to human experts, but these studies predominantly use small datasets of pre-selected, 'IED rich' records, which are not representative of clinical practice. Therefore, this study aims to analyse the accuracy of machine learning models in a large, routine, clinically representative cohort. METHODS: All routine EEGs performed in a large regional hospital in England were identified between June 2024 and February 2025. EEG records were run through the commercial machine learning model P15 and automated IED reports generated. The sensitivity, specificity, positive and negative predictive value of P15-detected IEDs were evaluated using the final clinical report as a reference standard. RESULTS: Of 484 EEG records, 53 were reported to contain at least one IED in the final clinical report. At P15's default sensitivity setting, sensitivity for IED detection was 81.1% (95% CI:77.6-84.6), specificity 59.9% (95% CI: 55.5-64.2), positive predictive value 19.9% (95% CI:16.3-23.5) and negative predictive value 96.3% (95% CI:94.6-98.0). DISCUSSION: This large-scale study of a machine learning model for identification of IEDs in a representative clinical population found a high negative predictive value suggesting that this may be a useful tool to rule out IEDs. However, the low positive predictive value demonstrates the potential for over-calling IEDs in routine EEGs. Future research should evaluate machine learning models alongside clinical feedback before this approach can have sufficient utility in direct clinical care.