Abstract
Background: ChatGPT is a large language model (LLM) online chatbot developed by OpenAI and launched in November 2022. Early adoption studies have shown high readiness to use this technology for health-related questions and self-diagnosis. However, the quality and clinical adequacy of health-related responses remain incompletely characterized. This study aimed to explore responses generated by ChatGPT-3.5 and ChatGPT-4.0 to common patient questions regarding scoliosis. Methods: Ten scoliosis-related frequently asked questions (FAQs) were selected from a larger pool of over 250 patient-facing questions compiled from 17 publicly available FAQ webpages and informed by a Google Trends analysis. Questions were harmonized, grouped by theme, and then reduced by rule-based expert review to a final set intended to represent common patient concerns. Results: The median ratings of ChatGPT-3.5 and ChatGPT-4.0 responses ranged from satisfactory, requiring minimal (2) to moderate clarification (3). Across the ten matched questions, no statistically detectable difference was found between models in this study setting (W = 8.0, p = 0.59; Cliff's δ = -0.12 [95% CI -0.58, 0.40]); however, given the small question set, unblinded rating process, and poor inter-rater reliability, this should not be interpreted as evidence of equivalence, non-inferiority, or comparable model performance. The results apply only to the 10-15 April 2024, online snapshots of ChatGPT-3.5 and ChatGPT-4.0 and should not be generalized to later model iterations. Conclusions: This study should be interpreted as a clinically oriented observational report, intended to inform physician awareness and patient-physician communication rather than validate chatbot accuracy or safety. In this 10-15 April 2024, sample, both model outputs frequently required clinician clarification. Given the small FAQ set, low inter-rater reliability, unblinded design, and single-sample outputs, the findings do not establish equivalence or superiority and apply only to the specific 10-15 April 2024, model snapshots and evaluated questions.