Abstract
Few-Shot Object Detection (FSOD) aims to identify new object categories with a limited amount of labeled data, which holds broad application prospects in real-life scenarios. Previous approaches usually ignore attention to critical information, which leads to the generation of low-quality prototypes and suboptimal performance in few-shot scenarios. To overcome the defect, an improved FSOD network is proposed in this paper, which mimics the human visual attention mechanism by emphasizing areas that are semantically important and rich in spatial information. Specifically, an Importance-Weighted Local Adaptive Prototype module is first introduced, which highlights key local features of support samples, and more expressive class prototypes are generated by assigning greater weights to salient regions so that generalization ability is effectively enhanced under few-shot settings. Secondly, an Imbalanced Diversity Sampling module is utilized to select diverse and challenging negative sample prototypes, which enhances inter-class separability and reduces confusion among visually similar categories. Moreover, a Weighted Non-Linear Fusion module is designed to integrate various forms of feature interaction. The contributions of the feature interactions are modulated by learnable importance weights, which improve the effect of feature fusion. Extensive experiments on PASCAL VOC and MS COCO benchmarks validate the effectiveness of our method. The experimental results reflect the fact that the mean average precision from our method is improved by 2.84% on the PASCAL VOC dataset compared with Fine-Grained Prototypes Distillation (FPD), and the AP from our method surpasses the recent FPD baseline by 0.8% and 1.8% on the MS COCO dataset, respectively.