Channel-spatial attention modules in convolutional neural networks for image classification

卷积神经网络中的通道空间注意力模块用于图像分类

阅读:2

Abstract

Many studies have established that the attention mechanism has great potential in improving the performance of Convolutional Neural Networks (CNNs) in image classification problems in recent years. Combining channel and spatial attention modules is one of the different kinds of attention mechanisms that are inspired by the visual perception of the human brain. So far, no paper has considered both parallel and sequential states of combining channel-spatial attention modules, so that while comparing them comprehensively and accurately, it can be definitively said which of them is more optimal in terms of a better balance between efficiency and computational complexity of the model. In this paper, we introduced two new types of channel-spatial attention modules, the Parallel Channel-spatial Attention Module (PCSAM) and the Sequential Channel-spatial Attention Module (SCSAM), to embed in the architecture of any CNN. Each of the proposed attention modules is composed of a channel and spatial attention sub-modules. The Channel Attention Module (CAM) and Spatial Attention Module (SAM) help the network in extracting the channels related to the architecture of the Region of Interest (RoI) and its location in the input feature maps, respectively. We increase the representation power of the attention-based networks by extracting the features using Global Average Pooling (GAP) and Global Maximum Pooling (GMP) in the CAM and SAM. Also, the Dilation Convolution (DC) layer is employed in the structure of the SAM instead of the standard convolution to better focus on the RoI in the feature maps. The PCSAM and SCSAM are implemented in the architecture of the ResNet18 and MobileNetv4 to produce the ResNet18PCSAM, ResNet18SCSAM, MobileNetv4PCSAM, and MobileNetv4SCSAM. All networks are trained and evaluated on three general image classification datasets, the CIFAR-10, CIFAR-100, and Tiny-ImageNet, with the same experimental conditions for 50 epochs. The classification results in the test step show that the MobileNetv4SCSAM has a better efficiency than other architectures on all datasets. It also achieved higher performance than the previous existing channel-spatial attention modules.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。