Abstract
The visual system has the ability to learn the statistical regularities (temporal and/or spatial) that characterize the visual scene automatically and implicitly. This ability is referred to as the visual statistical learning (VSL). The VSL could group several objects that have fixed statistical properties into a chunk. This complex process relies on the collaborative involvement of multiple brain regions that work together to learn the chunk. Although behavioral experiments have explored cognitive functions of the VSL, its computational mechanisms remain poorly understood. To address this issue, this study proposes a coupled shape-position recurrent neural network model based on the anatomical structure of the visual system to explain how chunk information is learned and represented in neural networks. The model comprises three core modules: the position network, which encodes object position information; the shape network, which encodes object shape information; and the decision network, which integrates the neuronal activity in the position and shape networks to make decisions. The model successfully simulates the results of a classic spatial VSL experiment. The distribution of neural firing rates in the decision network shows a significant difference between chunk and non-chunk conditions. Specifically, these neurons in the chunk condition exhibit stronger firing rates than those in the non-chunk condition. Furthermore, after the model learns a scene containing both chunk and non-chunk stimuli, neurons in the position network selectively encode far and near stimuli, respectively. In contrast, neurons in the shape network distinguish between chunk and non-chunk. The chunk encoding neurons selectively respond to specific chunks. These results indicate that the proposed model is able to learn spatial regularities of the stimuli to discriminate chunks from non-chunks, and neurons in the shape network selectively respond to chuck and non-chunk information. These findings offer important theoretical insights into the representation mechanisms of chunk information in neural networks and propose a new framework for modeling spatial VSL.