Abstract
BACKGROUND: Diabetes mellitus is a chronic metabolic disease with rising global prevalence. Adequate patient education is essential to encourage self-management and reduce complications. Artificial intelligence applications such as ChatGPT have emerged as potential supplementary resources for patient education alongside the broader integration of technology in healthcare. METHODS: A cross-sectional evaluation was conducted using ten frequently asked questions (FAQs) on diabetes, selected from the Diabetic Association of India and the International Diabetes Federation. ChatGPT-4o (accessed via the web interface in March 2025) generated responses to each question in separate, stand-alone chat sessions to simulate typical patient interactions. Five board-certified endocrinologists (diabetologists) with a mean clinical experience of ≥10 years independently evaluated the responses using a 4-point Likert scale across five domains: overall quality, content accuracy, clarity, relevance, and trustworthiness. Final domain scores were computed as the mean of all five raters' scores. Readability was assessed using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). All readability analyses apply exclusively to the English-language outputs generated in this study. RESULTS: The mean FRES was 38.19 and the mean FKGL was 16.87, indicating a reading level appropriate for college-educated individuals and substantially above the recommended sixth-grade benchmark for patient health materials. Mean response length was 300 ± 100 words across the ten prompts. Expert ratings were generally high: aggregated mean scores (±SD) were 4.0 (±0.0) for content accuracy and overall quality, 3.98 (±0.10) for relevance, and 3.9 (±0.20) for clarity and trustworthiness. No clinically inaccurate statements were identified by the raters; however, the high scores and narrow score range indicate a potential ceiling effect that limits discrimination between responses. Raters expressed concern about linguistic complexity, which may impede comprehension among patients with limited health literacy. CONCLUSIONS: ChatGPT-4o generated generally accurate and relevant diabetes education content, suggesting potential as a supplementary tool in diabetes care. However, the high reading-level complexity, small evaluation scope (ten prompts, one model, one session), and English-only assessment limit the generalisability of these findings. AI-generated content should supplement, not replace, clinician-led education. Future work should address language simplification, multilingual evaluation, and longitudinal assessment of patient outcomes.