Abstract
OBJECTIVE: Differentiating ulcerative colitis (UC) from Crohn's disease (CD) is challenging, particularly for nonexperts. Although artificial-intelligence-based image analysis has advanced endoscopic diagnosis, large language models of inflammatory bowel disease (IBD) require clinical validation. We evaluated the ability of ChatGPT to distinguish UC from CD using colonoscopy (CS) images with and without clinical information. METHODS: We retrospectively analyzed 386 and 161 patients with UC and CD, respectively, with active disease who underwent CS between April 2001 and May 2025. A representative endoscopic image showing severe activity at the initial flare was selected by a nonspecialist. Data were collected on lesion continuity and perianal disease. ChatGPT was asked to (1) classify UC or CD and (2) estimate UC probability using images alone or images plus clinical information. The IBD specialists performed task (1) under the same conditions. Their diagnostic performance was compared. RESULTS: The median age of the patients was 36.5 and 28 years in the UC and CD groups, respectively. The diagnostic accuracy without clinical information was 75.6% for ChatGPT and 84.9% for specialists, which increased to 87.4% and 88.7% with clinical information, respectively. The odds ratios for correct diagnosis markedly increased when clinical data were used. Receiver operator curve analysis of ChatGPT showed area under the curves of 0.750 without clinical information and 0.948 with clinical information. CONCLUSION: ChatGPT accurately discriminated between UC and CD, with diagnostic accuracy markedly increased via the integration of clinical information, suggesting applicability in clinical practice despite being less accurate than IBD specialists.