A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning

一种用于零样本多模态医学推理的主动式智能体协作框架

阅读:1

Abstract

The adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain-specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text-only inputs. To address these limitations, we propose MultiMedRes, a multimodal medical collaborative reasoning framework that simulates human physicians' communication by incorporating a learner agent to proactively acquire information from domain-specific expert models. MultiMedRes addresses medical multimodal reasoning problems through three steps i) Inquire: The learner agent decomposes complex medical reasoning problems into multiple domain-specific sub-problems; ii) Interact: The agent engages in iterative "ask-answer" interactions with expert models to obtain domain-specific knowledge; and iii) Integrate: The agent integrates all the acquired domain-specific knowledge to address the medical reasoning problems (e.g., identifying the difference of disease levels and abnormality sizes between medical images). We validate the effectiveness of our method on the task of difference visual question answering for X-ray images. The experiments show that our zero-shot prediction achieves state-of-the-art performance, surpassing fully supervised methods, which demonstrates that MultiMedRes could offer trustworthy and interpretable assistance to physicians in monitoring the treatment progression of patients, paving the way for effective human-AI interaction and collaboration.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。