Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists' Evaluation

比较 ChatGPT 3.5 和 Microsoft Copilot 在耳鼻喉科的应用性能:一项基于耳鼻喉科医生评价的对比研究

阅读:1

Abstract

To evaluate the response capabilities, in a public healthcare system otolaryngology job competition examination, of ChatGPT 3.5 and an internet-connected GPT-4 engine (Microsoft Copilot) with the real scores of otolaryngology specialists as the control group. In September 2023, 135 questions divided into theoretical and practical parts were input into ChatGPT 3.5 and an internet-connected GPT-4. The accuracy of AI responses was compared with the official results from otolaryngologists who took the exam, and statistical analysis was conducted using Stata 14.2. Copilot (GPT-4) outperformed ChatGPT 3.5. Copilot achieved a score of 88.5 points, while ChatGPT scored 60 points. Both AIs had discrepancies in their incorrect answers. Despite ChatGPT's proficiency, Copilot displayed superior performance, ranking as the second-best score among the 108 otolaryngologists who took the exam, while ChatGPT was placed 83rd. A chat powered by GPT-4 with internet access (Copilot) demonstrates superior performance in responding to multiple-choice medical questions compared to ChatGPT 3.5.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。