FasterMLP efficient vision networks combining attention mechanisms and wavelet downsampling

FasterMLP高效视觉网络结合了注意力机制和小波下采样

阅读:1

Abstract

The integration of Multi-layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), and attention mechanisms has been demonstrated to significantly enhance model performance across various computer vision tasks. In this paper, a novel lightweight neural network architecture, FasterMLP, is proposed to achieve high computational efficiency and accuracy, particularly in resource-constrained and real-time applications. FasterMLP is designed to combine the local connectivity and weight-sharing properties of CNNs with the global feature representation capabilities of MLPs, while feature extraction is enhanced through the Convolutional Block Attention Module and spatial dimensions are effectively reduced using Haar wavelet downsampling without sacrificing critical feature information. The architecture, structured into four stages, has been rigorously evaluated on multiple benchmarks. On the ImageNet-1K dataset, a top-1 accuracy 3.9% higher than that of MobileViT-XXS is achieved by FasterMLP-S, while being 2× and 2.7× faster on GPU and CPU, respectively. On the COCO dataset, the performance of FasterMLP-L is shown to be comparable to FasterNet-L with significantly fewer parameters, and on the Cityscapes dataset, a mean Intersection-over-Union of 81.7% is achieved, surpassing existing methods such as CCNet and DANet. These results demonstrate that FasterMLP can effectively balance computational efficiency and accuracy, making it particularly suitable for visual perception tasks in resource-constrained and real-time environments such as autonomous driving. Code is available at https://github.com/windisl/FasterMLP .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。