Abstract
PURPOSE: Developmental dysplasia of the hip (DDH) requires timely, guideline-concordant decisions to prevent long-term morbidity. ChatGPT-5.0 may support clinicians-especially where pediatric orthopedic expertise is limited, but their reliability across typical and discordant presentations is uncertain. This scenario-based validation study evaluated the accuracy of ChatGPT-5.0's management recommendations for DDH using 30 structured clinical cases and compared these outputs against AAOS (2022) and AAP (2016) guidelines. METHODS: Scenario-based validation using 30 unique cases: 20 concordant (aligned clinical and imaging findings) spanning Graf and acetabular index-based ages, and 10 mismatch scenarios with correct examinations but intentionally erroneous radiology. The primary outcome was guideline-concordant accuracy, categorized as correct, partially correct, undertreatment, overtreatment, or incorrect. Secondary outcomes included the effect of error-aware prompts and multilingual consistency. RESULTS: In concordant scenarios, guided ChatGPT achieved 100% correct, while non-logged-in ChatGPT achieved 95% with one overtreatment. In mismatch scenarios, guided ChatGPT frequently tends toward overtreatment and failing to recommend repeat ultrasound or urgent pediatric orthopedic consultation. Non-logged-in ChatGPT performed better in mismatch cases but similarly under-emphasized remeasurement/consultation. Error-aware prompts did not materially alter recommendations in either environment. Swahili queries produced outputs clinically identical to English responses. CONCLUSIONS: ChatGPT-5.0 provides reliable, guideline-concordant guidance for DDH when clinical and radiologic data are concordant, supporting potential use as a decision aid in settings without immediate pediatric orthopedic access. Safe clinical implementation requires human oversight and integration of guideline-based safety checks to prevent mismanagement in ambiguous cases.