Abstract
BACKGROUND/PURPOSE: Large language models (LLMs) exhibit significant potential for clinical decision support, yet their application in endodontic disease remains underexplored. MATERIALS AND METHODS: This study assessed the decision-making capabilities of three advanced LLMs (GPT-4o, Claude 3.5, and Grok2) in specialized endodontic contexts. A question bank of 421 multiple-choice questions was constructed across 27 core endodontic topics, including theory, procedures, and 35 complex cases. The three LLMs were tested using standardized prompts, with performance evaluated via topic-stratified accuracy analysis. RESULTS: Claude 3.5 achieved the highest overall accuracy (73.39 %), followed by Grok2 (66.27 %) and GPT-4o (46.32 %). Grok2 excelled in complex case analysis (69.57 %). The models performed strongly in theoretical domains (e.g., clinical examination, structural function, pharmacology) but showed limitations in complex scenarios and procedural techniques. CONCLUSION: LLMs hold promise as endodontic decision support tools, though domain-specific refinement is essential for effective clinical application.