Abstract
Shape perception is central to human vision, but its developmental origins remain unknown. Here we show that shape perception develops from three ingredients: (1) generic fitting systems, (2) embodied visual experiences, and (3) biologically plausible sensors. We first show that generic fitting models (transformers) trained on embodied visual experiences change from color-based to shape-based visual systems. We then perform in silico controlled-rearing experiments to determine what causes this developmental change. We find that view diversity-experiencing many views of the same object-produces shape perception. For embodied agents, view diversity comes for free: by moving through the world, agents acquire diverse temporally linked views, explaining how and why animals develop shape perception so rapidly. But when view diversity is restricted, by limiting where an agent can look or move, shape perception fails to develop. Finally, we show that retinas naturally transform images in ways that enhance shape learning, providing a biologically plausible substitute for artificial image augmentations. Together, our results support generic fitting theories of brain development and provide a template for building human-like shape perception in machines.