Benchmarking multimodal large language models on the dental licensing examination: Challenges with clinical image interpretation

在牙科执业资格考试中对多模态大型语言模型进行基准测试:临床图像解读的挑战

阅读:1

Abstract

BACKGROUND: /purpose: Large language models (LLMs) have been studied in text-based healthcare tasks, but their performance in multimodal dental applications has not yet been fully explored. This study evaluated the performance of four multimodal LLMs on dental licensing examination questions with both text-only and visually-based components. MATERIALS AND METHODS: Four multimodal LLMs, ChatGPT-4o (4o), OpenAI o1 (o1), Claude 3.5 Sonnet (Sonnet), and Gemini 2.0 Flash Thinking Experimental (Gemini), were tested on 353 questions from the 2024 Japanese National Dental Examination, including 204 text-only and 149 visually-based questions spanning 17 dental specialties. A zero-shot approach was used without prompt engineering. Performance was analyzed using Cochran's Q test and McNemar's test with Bonferroni correction. RESULTS: o1 achieved the highest overall correct response rate (81.9 %), followed by Sonnet (71.7 %), Gemini (66.6 %), and 4o (65.7 %). All models performed significantly better on text-only questions (79.9-92.2 %) than on visually-based questions (45.6-67.8 %). Performance varied by specialty, with highest scores in basic medical sciences (Dental pharmacology: 100 %; Oral physiology: 86.7-100 %) and lower scores in clinical specialties requiring visual interpretation (Orthodontics: 36.4-66.7 %). CONCLUSION: Multimodal LLMs demonstrate promising performance on dental examination questions, particularly in text-based scenarios, but significant challenges remain in complex visual interpretation. The remarkable zero-shot performance of newer models such as o1 suggests potential applications in dental education and certain aspects of clinical decision support, although further advances are needed before reliable application in visually complex diagnostic workflows.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。