A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering

乳腺癌筛查和诊断中基于乳房X光检查视觉问答的基准

阅读:1

Abstract

Breast cancer remains the most prevalent malignancy in women worldwide. Mammography-based early detection plays a pivotal role in improving patient survival outcomes. While large vision-language models offer transformative potential for mammogram visual question answering, the absence of standardized evaluation benchmarks currently makes it hard to fairly compare different large vision-language models' performance in mammogram interpretation. In this study, we address this critical gap through three key contributions: (1) We introduce MammoVQA, a mammogram visual question-answering dataset that unifies 15 public datasets, comprising 131,847 images (421K question-answering pairs) for image-level cases and 72,518 exams (476K images, 144K question-answering pairs) for exam-level cases. (2) Systematic evaluation of 12 recent high-performance large vision-language models (6 general, 6 medical) reveals diagnostic performance statistically equivalent to random guessing, highlighting their unreliability for mammogram interpretation. (3) Our domain-optimized LLaVA-Mammo achieves average +19.66% weighted accuracy gains over the best recent high-performance model in internal validation, with average +21.21% weighted accuracy improvements in external validation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。