Abstract
Humans and other animals move their eyes, heads, and bodies to interact with their surroundings. While essential for survival, these movements produce additional sensory signals that complicate visual scene analysis. However, these self-generated visual signals offer valuable information about self-motion and the three-dimensional structure of the environment. In this review, we examine recent advances in understanding depth and motion perception during self-motion, along with the underlying neural mechanisms. We also propose a comprehensive framework that integrates various visual phenomena, including optic flow parsing, depth from motion parallax, and coordinate transformation. The studies reviewed here begin to provide a more complete picture of how the visual system carries out a set of complex computations to jointly infer object motion, self-motion, and depth.