A lightweight small object detection model for UAV images based on deep semantic integration

一种基于深度语义集成的轻量级无人机图像小目标检测模型

阅读:1

Abstract

Most existing small object detection methods rely on residual blocks to process deep feature maps. However, these residual blocks, composed of multiple large-kernel convolution layers, incur high computational costs and contain redundant information, which makes it difficult to improve detection performance for small objects. To address this, we designed an improved feature pyramid network called L Feature Pyramid Network (L-FPN), which optimizes the allocation of computational resources for small object detection by reconstructing the original FPN structure. Based on L-FPN, we further proposed a small object detector named BPD-YOLO. We introduce a Dual-phase Asymptotic Feature Fusion mechanism (DAFF), where the shallow and deep semantic features extracted from the backbone network are initially fused in parallel to mitigate the semantic gap. Subsequently, the intermediate semantic layers are progressively integrated, enabling effective fusion of both shallow and deep feature representations. Additionally, we designed the Deep Spatial Pyramid Fusion module (DSPF), which generates multi-scale feature representations as an alternative to conventional residual block stacking, thereby reducing computational overhead. In the shallow feature extraction stage, DSPF focuses on semantic integration and enhances the extraction of small object features. This strategy, which adaptively selects different modules based on the resolution of the feature maps, is referred to as the Decoupled feature Extraction-semantic Integration mechanism (DEI). Finally, we conducted extensive experiments and thorough evaluations on both the VisDrone and TinyPerson datasets. The results demonstrate that, on the VisDrone dataset, compared to the baseline model YOLOv8n + p2, our BPD-YOLO model with L-FPN achieves a 2.8% improvement in mAP50 and a 1.4% increase in mAP50-95. On the TinyPerson dataset, BPD-YOLO further demonstrates its superiority in high-resolution feature extraction, effectively enhancing detection accuracy while significantly reducing computational costs.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。