Abstract
OBJECTIVE: Electroencephalograms (EEGs) are time-series records of the electrical potential from collective neural activity in the brain. EEG waveform patterns-rhythmic and irregular oscillations and transient patterns of sharp waves or spikes-are potential phenotypical biomarkers, reflecting genotype-specific neural activity. This is especially relevant to diagnosing epilepsy without direct seizure observations, which is common in clinical settings, as well as in animal models, which often have subtle neurological phenotypes without overt epilepsy. Herein, we investigate genotypic prediction from long-term EEG signals of freely behaving mice belonging to six groups defined by the presence or absence of a neurological disease-genotype (TSC1 gene knockout) in three different inbred strains with distinct genetic backgrounds. APPROACH: We propose a machine learning approach to predict the genotypes of individual mice from the occurrence counts of waveforms that approximate short windows of the EEG. That is, a dictionary of waveforms is optimized to approximate windows from each genotype, and the vectors of waveform occurrence counts are the features for predicting genotypes via logistic regression models. MAIN RESULTS: Across two-fold cross-validation of the waveform dictionary learning, and leave-one-individual-out genotype prediction, we find that waveform counts pooled over multiple hour segments enable reliable prediction of mouse strain with an accuracy of 70% (95% CI 62-78) compared to chance rate of 38%. For two of the three strains, DBA2 and C57B6, strain-specific classifiers reliably determined the epilepsy-genotype (TSC1 gene knockout) with accuracies of 86% (95% CI 70-101) and 67% (95% 55-79), respectively. None of the mice of these strains had evidence of overt seizures or EEG-based seizure detection. In comparison, a state-of-the-art time-series classification approach (Hydra) enables higher strain classification at 98%, comparable TSC1-genotype prediction for the two strains (86% and 71% respectively), but the method is not interpretable. SIGNIFICANCE: The methodologies and results show the potential of EEG waveforms as interpretable phenotypes and bag-of-waves as a feature representation for identifying epilepsy genotypes.