Investigating the temporal dynamics and modeling of mid-level feature representations in humans

研究人类中层特征表征的时间动态和建模

阅读:2

Abstract

Visual perception unfolds through a hierarchy of transformations, beginning with the extraction of low-level features, such as edges, and culminating in the representation of high-level features such as object categories. While the processing of low- and high-level features is well studied, the intermediate transformations, that is, mid-level features, remain poorly understood. Here, we introduce a stimulus set of naturalistic 3D-rendered images and videos with ground-truth annotations for five candidate mid-level features (reflectance, scene depth, world normals, lighting, and skeleton position) alongside for one low-level feature (edges) and for one high-level feature (action identity). To determine when these features are processed in the brain, we collected electroencephalography (EEG) responses during stimulus presentation and trained linearized encoding models to predict EEG responses from the annotations. We first showed that candidate mid-level features were best represented between ~100 and 250 ms post-stimulus, between low- and high-level features, and consistent with a bridging role linking sensory and semantic processing. We then assessed convolutional neural networks (CNNs) as models of mid-level feature processing in humans and observed that although their hierarchies were shallower, they exhibited a comparable processing order for mid-level but not low- or high-level features, only for videos. Together, our results support the view that mid-level features are tied to surface- and shape-related processing and establish 3D-rendered stimuli with annotations as a valuable tool for investigating mid-level vision in biological and artificial neural networks.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。