Abstract
BACKGROUND: Neonatal gastrointestinal surgical emergencies (NGSEs) require rapid decisions to prevent morbidity and mortality. This study assessed the potential use of ChatGPT in supporting clinical decision-making for NGSEs. METHODS: The challenging NGSE cases (ileal atresia, midgut volvulus, Hirschsprung disease, meconium ileus, and pseudo-obstruction) were converted into structured short-answer questions including histories and radiologic images. Questions covered differential diagnosis, diagnostic plan, management plan, final diagnosis, and surgical plan. Each case was scored out of 10 (maximum 50). Scenarios were presented to 10 general surgery (GS) residents, 10 GS attendings, and 10 pediatric surgery (PS) attendings. GPT-4o was tested with 10 iterations per case. Group scores were compared using appropriate statistical tests. RESULTS: A total of five cases were involved. GPT-4o achieved a mean score of 44.95 (89.9%), higher than GS residents (27.05, p<0.001) and GS attendings (28.35, p<0.001), but lower than PS attendings (47.70, p=0.021). Subgroup analysis showed GPT-4o matched PS attendings in management, final diagnosis, and surgical planning, but scored lower in differential diagnosis (87.8% vs. 92.8%, p=0.0479) and diagnostic plan (75.0% vs. 93.8%, p<0.001). Compared with GS residents and attendings, GPT-4o performed significantly better across all categories except diagnostic plan. CONCLUSIONS: GPT-4o demonstrated performance comparable to PS attendings in key management domains, while clearly surpassing GS residents and attendings overall. These findings suggest that GPT-4o may have potential as a supplementary decision-support tool for NGSEs, although clinical use requires further validation in real-world settings.