Knowledge-enhanced medical image classification via descriptive priors from large language models

基于大型语言模型描述性先验的知识增强型医学图像分类

阅读:1

Abstract

Medical image classification aims to categorise clinically significant imaging patterns, thereby facilitating accurate and timely diagnosis. However, existing approaches predominantly rely on visual features extracted from raw pixel data, often overlooking fine-grained diagnostic cues grounded in medical expertise. To address this limitation, we propose a novel knowledge-enhanced Model, called KEM, that leverages medical large vision-language models (Medical LVLMs) as domain experts to generate descriptive priors, which are used to guide and support clinical decision-making. Specifically, we prompt Medical LVLMs to generate rich, Multi-Dimension clinical descriptions tailored to each input image, capturing nuanced semantics. These descriptive priors are then encoded and fused with visual features through a dual cross-attention module, which enables bidirectional interaction and alignment between modalities. This design allows the model to dynamically attend to both textual and visual cues, thereby enhancing its ability to recognise subtle disease patterns. Comprehensive experiments on four benchmark datasets demonstrate that the proposed method significantly outperforms state-of-the-art vision-only models and exhibits strong generalization across varied clinical settings.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。