A Dual-Modal Adaptive Pyramid Transformer Algorithm for UAV Cross-Modal Object Detection

一种用于无人机跨模态目标检测的双模态自适应金字塔变换算法

阅读:1

Abstract

Unmanned Aerial Vehicles (UAVs) play vital roles in traffic surveillance, disaster management, and border security, highlighting the importance of reliable infrared-visible image detection under complex illumination conditions. However, UAV-based infrared-visible detection still faces challenges in multi-scale target recognition, robustness to lighting variations, and efficient cross-modal information utilization. To address these issues, this study proposes a lightweight Dual-modality Adaptive Pyramid Transformer (DAP) module integrated into the YOLOv8 framework. The DAP module employs a hierarchical self-attention mechanism and a residual fusion structure to achieve adaptive multi-scale representation and cross-modal semantic alignment while preserving modality-specific features. This design enables effective feature fusion with reduced computational cost, enhancing detection accuracy in complex environments. Experiments on the DroneVehicle and LLVIP datasets demonstrate that the proposed DAP-based YOLOv8 achieves mAP(50:95) scores of 61.2% and 62.1%, respectively, outperforming conventional methods. The results validate the capability of the DAP module to optimize cross-modal feature interaction and improve UAV real-time infrared-visible target detection, offering a practical and efficient solution for UAV applications such as traffic monitoring and disaster response.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。