Enhancing Food Image Recognition by Multi-Level Fusion and the Attention Mechanism

通过多级融合和注意力机制增强食物图像识别

阅读:1

Abstract

As a pivotal area of research in the field of computer vision, the technology for food identification has become indispensable across diverse domains including dietary nutrition monitoring, intelligent service provision in restaurants, and ensuring quality control within the food industry. However, recognizing food images falls within the domain of Fine-Grained Visual Classification (FGVC), which presents challenges such as inter-class similarity, intra-class variability, and the complexity of capturing intricate local features. Researchers have primarily focused on deep information in deep convolutional neural networks for fine-grained visual classification, often neglecting shallow and detailed information. Taking these factors into account, we propose a Multi-level Attention Feature Fusion Network (MAF-Net). Specifically, we use feature maps generated by the Convolutional Neural Networks (CNNs) backbone network at different stages as inputs. We apply a self-attention mechanism to identify local features on these feature maps and then stack them together. The feature vectors obtained through the attention mechanism are then integrated with the original input to enhance data augmentation. Simultaneously, to capture as many local features as possible, we encourage multi-scale features to concentrate on distinct local regions at each stage by maximizing the Kullback-Leibler Divergence (KL-divergence) between the different stages. Additionally, we present a novel approach called subclass center loss (SCloss) to implement label smoothing, minimize intra-class feature distribution differences, and enhance the model's generalization capability. Experiments conducted on three food image datasets-CETH Food-101, Vireo Food-172, and UEC Food-100-demonstrated the superiority of the proposed model. The model achieved Top-1 accuracies of 90.22%, 89.86%, and 90.61% on CETH Food-101, Vireo Food-172, and UEC Food-100, respectively. Notably, our method not only outperformed other methods in terms of the Top-5 accuracy of Vireo Food-172 but also achieved the highest performance in the Top-1 accuracies of UEC Food-100.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。