Microsoft Bing outperforms five other generative artificial intelligence chatbots in the Antwerp University multiple choice medical license exam

在安特卫普大学多项选择题形式的医学执照考试中,微软必应的表现优于其他五款生成式人工智能聊天机器人。

阅读:2

Abstract

Recently developed chatbots based on large language models (further called bots) have promising features which could facilitate medical education. Several bots are freely available, but their proficiency has been insufficiently evaluated. In this study the authors have tested the current performance on the multiple-choice medical licensing exam of University of Antwerp (Belgium) of six widely used bots: ChatGPT (OpenAI), Bard (Google), New Bing (Microsoft), Claude instant (Anthropic), Claude+ (Anthropic) and GPT-4 (OpenAI). The primary outcome was the performance on the exam expressed as a proportion of correct answers. Secondary analyses were done for a variety of features in the exam questions: easy versus difficult questions, grammatically positive versus negative questions, and clinical vignettes versus theoretical questions. Reasoning errors and untruthful statements (hallucinations) in the bots' answers were examined. All bots passed the exam; Bing and GPT-4 (both 76% correct answers) outperformed the other bots (62-67%, p = 0.03) and students (61%). Bots performed worse on difficult questions (62%, p = 0.06), but outperformed students (32%) on those questions even more (p<0.01). Hallucinations were found in 7% of Bing's and GPT4's answers, significantly lower than Bard (22%, p<0.01) and Claude Instant (19%, p = 0.02). Although the creators of all bots try to some extent to avoid their bots being used as a medical doctor, none of the tested bots succeeded as none refused to answer all clinical case questions.Bing was able to detect weak or ambiguous exam questions. Bots could be used as a time efficient tool to improve the quality of a multiple-choice exam.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。