[Evaluating the performance of generative AI in assisting the differential diagnosis of weight loss]

【评估生成式人工智能在辅助鉴别诊断体重减轻方面的性能】

阅读:2

Abstract

OBJECTIVES: To systematically evaluate the performance of generative artificial intelligence (GenAI) models, DeepSeek-V3 and the Qwen3 series, in the differential diagnosis of weight loss. METHODS: A search was conducted in the PubMed database for all case reports published in the American Journal of Case Reports between January 1, 2012 and June 2, 2025, containing the term "weight loss" in the title or abstract. Two senior general practitioners independently reviewed each case to determine whether it met predefined diagnostic criteria for weight loss (emaciation). Cases that did not meet these criteria, had incomplete information, or involved clearly defined specialty-specific diagnoses and treatments were excluded. The remaining cases were then compiled into standardized clinical case summaries. These summaries were presented to DeepSeek-V3 and the Qwen3 series models (Qwen3-235B-A22B, Qwen3-30B-A3B, and Qwen3-32B) to generate ranked lists of the top 10 differential diagnoses. The models were not specifically fine-tuned for this task. Sensitivity, precision, and F1-score were used to evaluate performance. Intergroup comparisons were performed using McNemar's test and Cochran's Q test. RESULTS: A total of 87 case were analyzed. DeepSeek-V3 demonstrated better performance than Qwen3-235B-A22B in sensitivity, precision, and F1-score, especially at the Top5 level (P=0.043). Among the Qwen3 series models, Qwen3-235B-A22B showed the best performance in sensitivity, precision, and F1-score for the Top1 diagnosis, but the differences among the three Qwen3 models across all diagnostic levels were not statistically significant (all P>0.05). CONCLUSIONS: Domestic GenAI models exhibit a characteristic of "breadth over precision" in the differential diagnosis of weight loss, with DeepSeek-V3 performing better at key diagnostic levels. Although the sensitivity and precision for the top-ranked diagnosis require improvement, these models have the potential to serve as effective clinical decision support tools, broadening the diagnostic perspectives of general practitioners.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。