The role of multimodality in clinical disease diagnosis: advances, challenges, and opportunities

多模态技术在临床疾病诊断中的作用:进展、挑战与机遇

阅读:1

Abstract

Advances in artificial intelligence (AI) have significantly improved medical diagnosis, with deep learning models achieving expert-level performance across unimodal tasks such as medical imaging, physiological signal analysis, electronic health record (EHR) modeling, and omics-based prediction. However, clinical decision-making is inherently multimodal, as diseases manifest through complex interactions among imaging phenotypes, molecular signatures, physiological measurements, and textual clinical documentation. Consequently, unimodal systems often lack robustness, generalizability, and clinical reliability. This survey provides a comprehensive and methodologically grounded review of multimodal learning for disease diagnosis, emphasizing the paradigm shifts that have emerged over the past five years. Beyond classical early, intermediate, and late fusion strategies, we synthesize modern cross-modal representation learning frameworks, including contrastive alignment, vision-language pretraining, graph and hypergraph-based multimodal reasoning, modality-agnostic representation learning, and missing-modality robust architectures. We further examine large-scale foundation-model style multimodal pretraining and recent advances in histology-transcriptomics and image-omics integration, which exemplify biologically grounded cross-modal learning beyond traditional fusion pipelines. In addition to summarizing widely used datasets and clinical applications across oncology, neurology, cardiology, pulmonology, and ophthalmology, we provide a methodological synthesis linking key challenges such as modality heterogeneity, incomplete data, fairness disparities, interpretability limitations, and cross-institutional distribution shift to representative solution frameworks proposed in the literature. By integrating theoretical formulations, architectural insights, and application-driven evidence, this survey moves beyond case-oriented performance comparisons and offers a structured perspective on how multimodal AI is evolving toward scalable, robust, and clinically trustworthy diagnostic systems.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。