Abstract
BACKGROUND: Studies have shown that humans can rapidly learn the shape of new objects or adjust their behavior when encountering novel situations. Research on visual cognition in the brain further indicates that the ventral visual pathway plays a critical role in core object recognition. While existing studies often focus on microscopic simulations of individual neural structures, few adopt a holistic, system-level perspective, making it difficult to achieve robust few-shot learning capabilities. METHOD: Inspired by the mechanisms and processes of the ventral visual stream, this paper proposes a computational model with a macroscopic neural architecture for few-shot learning. We reproduce the feature extraction functions of V1 and V2 using a well-trained Vision Transformer (ViT) and model the neuronal activity in V4 and IT using two neural fields. By connecting these neurons based on Hebbian learning rules, the proposed model stores the feature and category information of the input samples during support training. RESULTS: By employing a scale adaptation strategy, the proposed model emulates visual neural mechanisms, enables efficient learning, and outperforms state-of-the-art few-shot learning algorithms in comparative experiments on real-world image datasets, demonstrating human-like learning capabilities. CONCLUSION: Experimental results demonstrate that our ventral-stream-inspired machine-learning model achieves effective few-shot learning on real-world datasets.