Abstract
OBJECTIVE: To determine the accuracy of a custom version of the generative pretrained transformer (GPT)-4o large language model (LLM) in identifying PICU admissions with vs. without bacterial pneumonia using clinical notes. DESIGN: In this retrospective cohort study, the GPT-4o model was provided guidance on our institution's pneumonia diagnosis practices through a custom prompt and instructed to analyze PICU provider notes from the first 2 calendar days of PICU admission to identify bacterial pneumonia diagnoses. Diagnoses from the manually curated Virtual Pediatric Systems (VPS) Registry were used as the gold standard. SETTING: A 48-bed, academic, quaternary care PICU. PATIENTS: Children 3 months old to 18 years old admitted to the PICU from January 1, 2023, to December 31, 2023. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: GPT-4o analyzed 10,081 notes from 3,317 PICU admissions over 5.0 minutes (mean 0.03 s per note). Of the 3317 study encounters, 481(14.5%) had a VPS admission pneumonia diagnosis. GPT-4o accurately classified 3143 of 3317 (94.8%) encounters. In a post hoc adjudication analysis, a blinded PICU attending reviewed patient charts with VPS-GPT discordant classifications. The GPT-4o classification matched that of the blinded PICU attending in 125 of 174 (71.8%) of such encounters. The most common reason for incorrect classification by GPT-4o was that a pneumonia diagnosis was listed in the initial notes but later rescinded when a different diagnosis was identified. CONCLUSIONS: The GPT-4o LLM was able to accurately and rapidly identify critically ill children with vs. without bacterial pneumonia. This study suggests similar tools could be developed to automate and accelerate processes typically requiring manual chart review.