Exploring the role of artificial intelligence in Turkish orthopedic progression exams

探索人工智能在土耳其骨科进展检查中的作用

阅读:2

Abstract

OBJECTIVE: The aim of this study was to evaluate and compare the performance of the artificial intelligence (AI) models ChatGPT-3.5, ChatGPT-4, and Gemini on the Turkish Specialization Training and Development Examination (UEGS) to determine their utility in medical education and their potential to improve patient care. METHODS: This retrospective study analyzed responses of ChatGPT-3.5, ChatGPT-4, and Gemini to 1000 true or false questions from UEGS administered over 5 years (2018-2023). Questions, encompassing 9 orthopedic subspecialties, were categorized by 2 independent residents, with discrepancies resolved by a senior author. Artificial intelligence models were restarted for each query to prevent data retention. Performance was evaluated by calculating net scores and comparing them to orthopedic resident scores obtained from the Turkish Orthopedics and Traumatology Education Council (TOTEK) database. Statistical analyses included chi-squared tests, Bonferroni-adjusted Z tests, Cochran's Q test, and receiver operating characteristic (ROC) analysis to determine the optimal question length for AI accuracy. All AI responses were generated independently without retaining prior information. RESULTS: Significant di!erences in AI tool accuracy were observed across di!erent years and subspecialties (P < .001). ChatGPT-4 consistently outperformed other models, achieving the highest overall accuracy (95% in specific subspecialties). Notably, ChatGPT-4 demonstrated superior performance in Basic and General Orthopedics and Foot and Ankle Surgery, while Gemini and ChatGPT-3.5 showed variability in accuracy across topics and years. Receiver operating characteristic analysis revealed a significant relationship between shorter letter counts and higher accuracy for ChatGPT-4 (P=.002). ChatGPT-4 showed significant negative correlations between letter count and accuracy across all years (r="0.099, P=.002), outperformed residents in basic and general orthopedics (P=.015) and trauma (P=.012), unlike other AI models. CONCLUSION: The findings underscore the advancing role of AI in the medical field, with ChatGPT-4 demonstrating significant potential as a tool for medical education and clinical decision-making. Continuous evaluation and refinement of AI technologies are essential to enhance their educational and clinical impact.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。