Abstract
While deep learning has drastically improved the performance of electroencephalography (EEG) analysis, it remains unclear what these models, such as EEGNet, learn from the data and how their learned features relate to neuroscientific concepts. In this work, we introduce a comprehensive interpretability framework for deep learning models of neural data based on Concept Relevance Propagation (CRP), an extension of layer-wise relevance propagation that enables the analysis of abstract concepts encoded by individual neurons and filters. We apply CRP to individual filters of convolutional neural networks (EEGNet) trained using leave-one-out cross-validation. To identify common classification strategies across models, we guide the selection of representative data for individual filters using relevance maximization, reduce dimensionality via UMAP, and identify clusters of filters encoding similar concepts through density-based clustering. To gain insight into the neural correlates of these tasks, we analyze the learned features across multiple data domains without requiring model retraining. We integrate a virtual inspection layer to project explanations into the frequency domain, enabling the simultaneous analysis of spatial, temporal, and spectral aspects using topographic maps, functional grouping, and independent component analysis (ICA). Using three EEG classification tasks-auditory attention, internal/external attention, and motor imagery-we demonstrate that our approach reveals interpretable, task-relevant neural patterns that generalize across participants. Overall, this framework provides a step toward understanding the models itself and gaining insights into the tasks in terms of neuroscience.