Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture

基于LSTM和多编码器Transformer架构的新型概念图像描述模型

阅读:1

Abstract

Captioning an image involves using a combination of vision and language models to describe the image in an expressive and concise sentence. Successful captioning task requires extracting as much information as possible from the corresponding image. One of these key pieces of information is the topic to which the image belongs. The state-of-the-art methods used topic modeling depending only on caption text in order to extract these topics. The problem with extracting the topics using topic modeling only on caption text is that it lacks the consideration of the image's semantic information. Instead, concept modeling extracts the concepts directly from the images in addition to considering the corresponding caption text. Concept modeling can be used in image captioning to extremely capture the image contexts and benefit from it to produce more accurate descriptions. In this paper, novel image captioning models are proposed by utilizing the concept modeling technique. The first concept-based model is proposed by utilizing LSTM as a decoder while the second model is proposed in association with new multi-encoder transformer architecture. Standard metrics have been used to evaluate the proposed models using Microsoft COCO and Flickr30K datasets. The proposed models outperformed the related work methods with reduced computational complexity.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。