Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases

GPT-4 在常见临床场景和疑难病例中的诊断准确性

阅读:1

Abstract

INTRODUCTION: Large language models (LLMs) have a high diagnostic accuracy when they evaluate previously published clinical cases. METHODS: We compared the accuracy of GPT-4's differential diagnoses for previously unpublished challenging case scenarios with the diagnostic accuracy for previously published cases. RESULTS: For a set of previously unpublished challenging clinical cases, GPT-4 achieved 61.1% correct in its top 6 diagnoses versus the previously reported 49.1% for physicians. For a set of 45 clinical vignettes of more common clinical scenarios, GPT-4 included the correct diagnosis in its top 3 diagnoses 100% of the time versus the previously reported 84.3% for physicians. CONCLUSIONS: GPT-4 performs at a level at least as good as, if not better than, that of experienced physicians on highly challenging cases in internal medicine. The extraordinary performance of GPT-4 on diagnosing common clinical scenarios could be explained in part by the fact that these cases were previously published and may have been included in the training dataset for this LLM.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。