Abstract
BACKGROUND/PURPOSE: Large language models (LLMs) offer promising applications in dentistry, but their performance in specialized, image-rich contexts such as dental technology examinations remains uncertain. The purpose of this study was to evaluate the accuracy of three multimodal LLMs, ChatGPT-4o (4o), OpenAI o1 (o1), and Claude 3.5 Sonnet (Sonnet), when presented with questions from the Japanese National Examination for Dental Technicians. MATERIALS AND METHODS: A total of 240 multiple-choice questions from 2022 to 2024 theory sections of the exam were used. Each question, including its accompanying figures or images where applicable, was presented to the three LLMs in a zero-shot manner without specialized prompting. Correct response rates were calculated overall, as well as by question type (text-only vs. visually-based) and subject area. Statistical comparisons were performed using Cochran's Q test, followed by McNemar's test with Bonferroni correction where indicated. RESULTS: Overall correct response rates were 58.3 % (4o), 67.5 % (o1), and 64.6 % (Claude 3.5 Sonnet). For text-only questions, o1 achieved the highest accuracy (79.1 %), significantly outperforming 4o (68.3 %; P = 0.017). In contrast, all models showed reduced accuracy on visually-based questions (44.6-55.4 %), with no significant difference among them. CONCLUSION: These results suggest that multimodal LLMs can supplement theoretical dental technology education, although their limited performance on visual tasks indicates the need for traditional hands-on training. Enhanced image interpretation skills may help address workforce challenges in dental technology.