Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media

基于图像的文本增强自监督对齐模型在社交媒体多模态情感分析中的应用

阅读:1

Abstract

The rapid development of social media has driven the need for opinion mining and sentiment analysis based on multimodal samples. As a fine-grained task within multimodal sentiment analysis, aspect-based multimodal sentiment analysis (ABMSA) enables the accurate and efficient determination of sentiment polarity for aspect-level targets. However, traditional ABMSA methods often perform suboptimally on social media samples, as the images in these samples typically contain embedded text that conventional models overlook. Such text influences sentiment judgment. To address this issue, we propose a text-in-image enhanced self-supervised alignment model (TESAM) that accounts for multimodal information more comprehensively. Specifically, we employed Optical Character Recognition technology to extract embedded text from images and, based on the principle that text-in-image is an integral part of the visual modality, fused it with visual features to obtain more comprehensive image representations. Additionally, we incorporate aspect words to guide the model in disregarding irrelevant semantic features, thereby reducing noise interference. Furthermore, to mitigate the semantic gap between modalities, we propose pre-training the feature extraction module with self-supervised alignment. During this pre-training stage, unimodal semantic embeddings from both modalities are aligned by calculating errors using Euclidean distance and cosine similarity. Experimental results demonstrate that TESAM achieved remarkable performances on three ABMSA benchmarks. These results validate the rationale and effectiveness of our proposed improvements.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。