CATR: CNN augmented transformer for object detection in remote sensing imagery

CATR:用于遥感图像目标检测的 CNN 增强型 Transformer

阅读:1

Abstract

Object detection in high-resolution aerial imagery is challenging due to scale changes, occlusion, clutter, and limited annotated datasets. While CNNs like YOLO and Faster R-CNN have progressed, they lack effective long-range dependency capture. We propose the CNN augmented detection transformer approach which we called CATR. In our quest, we compared the proposed framework with the transformer-based DETR and state-of-the-art CNNs on the DOTA dataset. DETR, with its end-to-end transformer and direct set predictions, streamlines the pipeline by removing anchor boxes and non-maximum suppression, improving robustness in cluttered aerial scenes. Our findings show DETR's superior accuracy (72% mAP@0.5), outperforming CNNs by up to 13%. However, DETR has higher computational expense (86.3 GFLOPs) and slower speed (12 FPS). The proposed hybrid CNN-transformer architecture has a balanced accuracy and speed, exploiting CNN features with global attention for improved small object detection, augmented by the segmentation by CNN. This study confirms transformer models, especially when combined with CNN, are highly promising for complex aerial environments, offering a strong alternative to traditional CNNs by globally modeling context and occlusion. While efficiency improvements are ongoing, this research provides a valuable path for future geospatial applications, including remote sensing and disaster response.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。