ConsTCM: aligning fundus images with constitution differentiation in multimodal language model for Traditional Chinese Medicine

ConsTCM:将眼底图像与中医多模态语言模型中的体质鉴别相结合

阅读:1

Abstract

Visual Language Models (VLMs) have shown significant potential in processing multimodal tasks in a wide range of domains, such as medical image understanding, comprehensive diagnosis, etc. In Traditional Chinese Medicine (TCM), VLMs have also achieved promising performance in various tasks, including symptom differentiation and constitution diagnosis. However, in these TCM-related tasks, there are challenges including the lack of TCM-specific multimodal datasets and the weak associations between the examination results and TCM treatment strategies. To address these problems, we propose ConsTCM, which is trained via a two-stage finetuning pipeline, from the basic SFT to the specific SFT. The basic SFT stage utilizes general ophthalmoscopy datasets, encouraging the model to capture general features of the fundus images. In the specific SFT stage, we collected real-world self-labeled cases from hospitals, which are used to further finetune the model and enhance the model performance in the TCM constitution tasks. Besides, we constructed the finetuning dataset, which is derived from the mixture of the existing classification tasks and the real-world hospital data. Finally, we train the model using this pipeline, resulting in ConsTCM. A series of experiments verify our pipeline. Compared with baseline models, ConsTCM achieves a 5.6% performance improvement. Meanwhile, the ablation study shows that the proposed two-stage pipeline significantly enhances the model performance, as evidenced by a 47% improvement in performance of ConsTCM in objective evaluation compared with the base model.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。