A multi-modal prompt-tuning method of ultrasound diagnosis for thyroid nodule

一种用于甲状腺结节超声诊断的多模式快速调谐方法

阅读:1

Abstract

BACKGROUND AND OBJECTIVE: Accurate diagnosis of thyroid nodules using ultrasound images heavily depends on the clinical expertise of radiologists. This reliance poses significant challenges in underdeveloped countries and regions where access to specialized medical resources is limited. Recently, Multi-modal Large Language Models (M-LLMs) have demonstrated promising potential in handling heterogeneous data, such as images and text, making them attractive candidates for automating labor-intensive diagnostic tasks. However, M-LLMs often struggle in ultrasound diagnosis of thyroid nodules for two main reasons: (1) without domain-specific fine-tuning, they are prone to generating hallucinated content, especially in classification tasks that demand expert-level decision-making; and (2) the cost and effort required for ultrasound multi-modal datasets of thyroid nodules are prohibitively high, which are essential for fine-tuning M-LLMs. METHODS: We propose a novel multi-modal prompt-tuning method based on ultrasound images and textual description, which can assist radiologists in improving their diagnoses of the etiology of thyroid nodules. Our approach leverages an image encoder and a prompt-tuning framework to learn effective representations from both modalities without the need for expensive full model fine-tuning. The fused multi-modal features are then used to improve the diagnosis of thyroid nodules. These obtained features are re-input into the multi-layer perceptron (MLP) model to fuse multi-modal relationships for complementing image features and assist in the diagnosis of thyroid nodules. RESULTS: Extensive experiments on publicly available and private enrolled datasets demonstrate that our method achieved state-of-the-art performance. Our method significantly outperformed traditional single-modality methods, with accuracy improvements of up to 40.62 over ResNet and 28.51% over AlexNet on the publicly available dataset. In contrast to other multi-modal models, our method achieved superior performance of up to 23.12% and 25.21% on accuracy and F1 score. CONCLUSIONS: Our method even surpasses all participating radiologists in accuracy, highlighting its strong potential to assist in expert-level diagnostic decision-making and provide scalable support for resource-limited clinical environments. Practically, it facilitates faster and more consistent thyroid nodule screening, thereby enhancing diagnostic efficiency.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。