Hyperbolic vision language representation learning on chest radiology images

基于胸部放射影像的双曲视觉语言表征学习

阅读:1

Abstract

Given the visual-semantic hierarchy between images and texts, hyperbolic embeddings have been employed for visual-semantic representation learning, leveraging the advantages of hierarchy modeling in hyperbolic space. This approach demonstrates notable advantages in zero-shot learning tasks. However, unlike general image-text alignment tasks, textual data in the medical domain often comprises complex sentences describing various conditions or diseases, posing challenges for vision language models to comprehend free-text medical reports. Consequently, we propose a novel pretraining method specifically for medical image-text data in hyperbolic space. This method uses structured radiology reports, which consist of a set of triplets, and then converts these triplets into sentences through prompt engineering. To address the challenge that diseases or symptoms generally occur in local regions, we introduce a global + local image feature extraction module. By leveraging the hierarchy modeling advantages of hyperbolic space, we employ entailment loss to model the partial order relationship between images and texts. Experimental results show that our method exhibits better generalization and superior performance compared to baseline methods in various zero-shot tasks and different datasets.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。