Abstract
BACKGROUND: The increasing incidence of diabetes poses a significant burden on healthcare systems. Limited research exists on tools to assist providers in developing personalized glucose-lowering strategies, which could alleviate this pressure and enhance patient outcomes. OBJECTIVE: This study aims to evaluate the capability of ChatGPT-4o in developing personalized glucose-lowering strategies for individuals with diabetes. METHODS: First, an evaluation of ChatGPT-4o's performance on China's qualification examination for attending physicians in endocrinology. Second, a cross-sectional study was conducted, involving the comparison of glucose-lowering strategies formulated by ChatGPT-4o, general practitioners (GPs), and attending physicians (APs) in endocrinology for a set of 30 real-world diabetes cases. Three clinical experts scored blindly the reasonableness of each strategy on a scale, with stratification of cases into three complexity levels (A, B, and C) and evaluation of mean scores for each level. RESULTS: ChatGPT-4o successfully passed all sections of the qualification examination with scores above the 60% threshold. In developing glucose-lowering strategies, ChatGPT-4o achieved a mean score comparable to GPs (82.24 ± 9.933 vs 79.83 ± 3.768; p = .317) but lower than APs (82.24 ± 9.933 vs 86.35 ± 4.142; p = .0467). Performance declined with increasing case complexity, with mean scores dropping from 89.90 ± 2.936 for simple cases (A-level) to 76.12 ± 11.93 for complex cases (C-level) (p <.0020). CONCLUSIONS: ChatGPT-4o performs reliably in generating glucose-lowering strategies for simpler diabetes cases, highlighting its potential to assist community health workers. However, its accuracy in complex cases, especially concerning medication contraindications, requires improvement.