Abstract
OBJECTIVES: This study aimed to evaluate the performance and efficacy of five AI chatbots in providing orthodontic information. MATERIALS AND METHODS: The study included 80 multiple choice questions (MCQs) sourced from orthodontic exam materials based on renowned orthodontics textbooks. The accuracy of five AI chatbots (ChatGPT, DeepSeek, Gemini, MedgebraGPT and Meta AI) was assessed based on their ability to answer correct answers and was compared with the response for the same questions by the dental students. The item-difficulty index score was also calculated for the AI responses. Both dependent and independent MCQs were assessed separately for student and AI bot performance. RESULTS: Among all the chatbots, DeepSeek and Medgebra GPT provided the highest and lowest proportion of correct answers respectively, in both the rounds. The performance of Gemini [(Mc Nemar p = 0.64, kappa = 0.46, (p < 0.001)] and DeepSeek [(Mc Nemar p = 1.00, kappa = 0.74 (p < 0.001)] improved in the second round with a moderate and substantial level of agreement respectively. While, the performance of ChatGPT and Meta AI did not improve in the second round. An inter-rater reliability between item difficulty index based on Students and AI performance showed slight level of agreement (weighted kappa = 0.160), but the level of agreement between them was not statistically significant (p = 0.064). CONCLUSION: DeepSeek provided highest fraction of correct answers in orthodontics, while Medgebra GPT provided the least. Although, these chatbots shared some common understanding of orthodontics, their performance varied and further refinement might be necessary to improve their consistency and accuracy in providing reliable answers.