Mutual contextual relation-guided dynamic graph networks for cross-modal image-text retrieval

基于相互上下文关系的动态图网络用于跨模态图像-文本检索

阅读:1

Abstract

With the rapid growth of multimodal data on the web, cross-modal retrieval has become increasingly important for applications such as multimedia search and content recommendation. It aims to align visual and textual features to retrieve semantically relevant content across modalities. However, this remains a challenging task due to the inherent heterogeneity and semantic gap between image and text representations. This paper proposes mutual contextual relation-guided dynamic graph network that integrates ViT, BERT and GCNN to construct a unified and interpretable multimodal representation space for image-text matching. The ViT and BERT features are structured into a dynamic cross-modal feature graph (DCMFG), where nodes represent image features and text features, and edges are dynamically updated based on mutual contextual relations that is neighboring relations extracted using KNN. The attention-guided mechanism refines graph connections, ensuring adaptive and context-aware alignment between modalities. The mutual contextual relation helps in identifying relevant neighborhood structures among image and text nodes, enabling the graph to capture both local and global associations. Meanwhile, the attention mechanism dynamically weights edges, enhancing the propagation of important cross-modal interactions. This emphasizes meaningful connections or edges among different modality nodes, improving the interpretability by revealing how image regions and text features interact. This approach overcome the limitations of existing models that rely on static feature alignment and insufficient modeling of contextual relationships. Experimental results on benchmark datasets MirFlickr-25K and NUS-WIDE are used to demonstrate significant performance improvements over state-of-the-art methods in precision and recall, validating its effectiveness for cross-modal retrieval.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。