Orthodontic knowledge assessment: A comparison of five AI Chatbots

正畸知识评估:五款人工智能聊天机器人的比较

阅读:1

Abstract

OBJECTIVES: This study aimed to evaluate the performance and efficacy of five AI chatbots in providing orthodontic information. MATERIALS AND METHODS: The study included 80 multiple choice questions (MCQs) sourced from orthodontic exam materials based on renowned orthodontics textbooks. The accuracy of five AI chatbots (ChatGPT, DeepSeek, Gemini, MedgebraGPT and Meta AI) was assessed based on their ability to answer correct answers and was compared with the response for the same questions by the dental students. The item-difficulty index score was also calculated for the AI responses. Both dependent and independent MCQs were assessed separately for student and AI bot performance. RESULTS: Among all the chatbots, DeepSeek and Medgebra GPT provided the highest and lowest proportion of correct answers respectively, in both the rounds. The performance of Gemini [(Mc Nemar p = 0.64, kappa = 0.46, (p < 0.001)] and DeepSeek [(Mc Nemar p = 1.00, kappa = 0.74 (p < 0.001)] improved in the second round with a moderate and substantial level of agreement respectively. While, the performance of ChatGPT and Meta AI did not improve in the second round. An inter-rater reliability between item difficulty index based on Students and AI performance showed slight level of agreement (weighted kappa = 0.160), but the level of agreement between them was not statistically significant (p = 0.064). CONCLUSION: DeepSeek provided highest fraction of correct answers in orthodontics, while Medgebra GPT provided the least. Although, these chatbots shared some common understanding of orthodontics, their performance varied and further refinement might be necessary to improve their consistency and accuracy in providing reliable answers.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。