Abstract
How transform-invariant visual representations of objects and faces are learned in the ventral visual cortical pathway is a massive computational problem. Here we describe key advances towards a biologically plausible four-layer network that performs these computations from the primary visual cortex to the inferior temporal visual cortex. The architecture is a four-layer competitive network with layer-to-layer convergence using a short-term memory trace local synaptic learning rule to associate transforming inputs from an object during natural viewing. The key advances towards biological plausibility include: (1) a synaptic modification rule including long-term depression dependent on synaptic strength instead of artificial synaptic weight normalization; (2) limiting the strength of synapses promotes distributed weights, improving transform-invariant learning; (3) reducing the ability of low firing rate neurons to participate in learning analogous to the NMDA receptor non-linearity can increase the storage capacity; (4) demonstrated network scalability towards high capacity. These advances have many implications for better understanding of cortical computations. These advances in biological plausibility of this approach are compared with artificial networks of the same ventral cortical processing stream that do not use a local synaptic learning rule and are less biologically plausible, and implications for AI models are described.