Multiscale object detection model based on pyramid vision transformer

基于金字塔视觉变换器的多尺度目标检测模型

阅读:1

Abstract

The sizes of objects within an image can vary significantly. In particular, in fields such as construction, manufacturing, and healthcare, where image analysis directly affects human life and safety, it is crucial to accurately detect even small objects. To address this, the present study proposes a multiscale object detection model that employs the Pyramid Vision Transformer (PVT) as the backbone network for the YOLO model. This approach compensates for the limitations of the Spatial Pyramid Pooling Feature (SPPF) used in conventional YOLO models and improves the detection accuracy for small objects. The proposed transformer-based multiscale object detection model aims to effectively capture long-range dependencies while simplifying complex pre-processing and post-processing procedures. Furthermore, it generates feature maps at various resolutions, enabling multiscale feature representation and object detection. In particular, by leveraging Global Self-Attention, the model leverages contextual information from the entire image, thereby enhancing understanding of object relationships and improving overall scene comprehension.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。