Abstract
Most existing small object detection methods rely on residual blocks to process deep feature maps. However, these residual blocks, composed of multiple large-kernel convolution layers, incur high computational costs and contain redundant information, which makes it difficult to improve detection performance for small objects. To address this, we designed an improved feature pyramid network called L Feature Pyramid Network (L-FPN), which optimizes the allocation of computational resources for small object detection by reconstructing the original FPN structure. Based on L-FPN, we further proposed a small object detector named BPD-YOLO. We introduce a Dual-phase Asymptotic Feature Fusion mechanism (DAFF), where the shallow and deep semantic features extracted from the backbone network are initially fused in parallel to mitigate the semantic gap. Subsequently, the intermediate semantic layers are progressively integrated, enabling effective fusion of both shallow and deep feature representations. Additionally, we designed the Deep Spatial Pyramid Fusion module (DSPF), which generates multi-scale feature representations as an alternative to conventional residual block stacking, thereby reducing computational overhead. In the shallow feature extraction stage, DSPF focuses on semantic integration and enhances the extraction of small object features. This strategy, which adaptively selects different modules based on the resolution of the feature maps, is referred to as the Decoupled feature Extraction-semantic Integration mechanism (DEI). Finally, we conducted extensive experiments and thorough evaluations on both the VisDrone and TinyPerson datasets. The results demonstrate that, on the VisDrone dataset, compared to the baseline model YOLOv8n + p2, our BPD-YOLO model with L-FPN achieves a 2.8% improvement in mAP50 and a 1.4% increase in mAP50-95. On the TinyPerson dataset, BPD-YOLO further demonstrates its superiority in high-resolution feature extraction, effectively enhancing detection accuracy while significantly reducing computational costs.