Abstract
Convolutional neural networks (CNNs) provide powerful models of neural sensory encoding, but their complexity makes it difficult to discern computations that support their performance. Here, to address this limitation, we developed a linear-nonlinear subspace model that identifies the most informative sensory dimensions captured by a CNN. A CNN was trained on single-neuron data recorded from auditory cortex of ferrets during presentation of a large natural sound set. Each neuron's linear tuning subspace was computed by applying dimensionality reduction to the gradient of CNN output relative to input. Subspace projections were combined nonlinearly to predict neural activity. The resulting model was functionally equivalent to the CNN. Analysis of trained models showed that responses of local neural populations sparsely tiled a shared stimulus subspace. Encoding properties also differed between cell types and layers, reflecting their position in the cortical circuit. More generally, these results establish a framework for interpreting deep-learning-based encoding models.