Multi path attention and scale aware fusion for accurate object detection in remote sensing imagery

基于多路径注意力机制和尺度感知融合的遥感图像目标精确检测

阅读:1

Abstract

The pursuit of accurate yet computationally efficient object detection within remote sensing imagery remains a cornerstone for the advancement of intelligent interpretation systems. Although substantial progress has been achieved in recent years, prevailing approaches still exhibit notable deficiencies in three critical aspects: the discriminative capacity of feature representation, the depth of semantic modeling, and the effectiveness of multi-scale information fusion. These shortcomings become particularly pronounced when addressing small-scale targets, which are highly susceptible to omission or misclassification. In response to these limitations, this work introduces HyperFusion-DEIM, a cascaded detection paradigm specifically designed to simultaneously reinforce object-level representations, enrich contextual semantic dependencies, and optimize scale-aware feature integration. Central to this framework is the Multi-Path Attention Network (MAPNet), which augments shallow semantic cues and edge-texture sensitivity for small object recognition through the joint operation of the Multi-Path Attention Fusion (MPAF) module and the Shallow Robust Feature Downsampling (SRFD) mechanism. Complementing this, the Scale-Aware Feature Enhancement (SAFE) encoder incorporates a Multi-level Feature Concentration (MFC) module to achieve cross-layer geometric alignment, while the integration of Transformer layers with HyperACE enables the capture of long-range semantic correlations without compromising spatial fidelity. Empirical validation conducted on the SIMD and VEDAI benchmarks demonstrates the clear superiority of HyperFusion-DEIM over state-of-the-art lightweight detectors in both predictive accuracy and robustness. Specifically, the model attains 64.5% AP on SIMD, outperforming RT-DETR and DEIM by 4.8% and 4.6%, respectively, while sustaining a peak inference throughput of 296.33 FPS. On VEDAI, HyperFusion-DEIM surpasses YOLOv12 and YOLOv13 by margins of 4.9% and 8.0%, and exceeds RT-DETRv2 and DEIM by 2.5% and 8.5%, all while maintaining real-time operation at 79.7 FPS. This performance showcases HyperFusion-DEIM's practical viability for real-time detection, particularly in resource-constrained environments where both speed and accuracy are critical.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。