Abstract
BACKGROUND: Artificial Intelligence (AI) has been increasingly explored in healthcare, particularly in emergency department (ED) triage. This study aimed to evaluate the effectiveness of the AI chatbot ChatGPT in triaging patients, focusing on its accuracy, safety, efficiency, and impact on patient care. METHODS: A prospective observational study was conducted at the ED of King Saud Medical City (KSMC) in Riyadh, Saudi Arabia, with a sample size of 138 patients. Patients requiring immediate resuscitation were excluded. ED physicians assigned triage scores using the Canadian Triage and Acuity Scale (CTAS), followed by AI-generated scores for the same patients. In cases of discrepancy, the final decision by the senior ED consultant was considered the gold standard. The study assessed inter-rater reliability between AI and human raters and evaluated the accuracy of each compared to the consultant's assessment. RESULTS: The results indicated a high agreement rate (85.61%) between ChatGPT and ED physicians, with substantial inter-rater reliability (κ = 0.780, 95% Confidence Interval [CI] 0.676-0.884, p < 0.001). Agreement between ED physicians and consultants was at 63.9%, with moderate reliability (κ = 0.406, 95% CI 0.006-0.806, p = 0.018). Consultants assigned lower acuity levels than physicians in most cases. ChatGPT's accuracy compared to the consultant was 42.86%, with slight reliability, showing a tendency to overestimate acuity, particularly in critical cases. However, it performed better in mid-range acuity levels. CONCLUSION: The findings suggested that AI could support ED triage by aligning closely with human decision-making. However, its overestimation of severity could lead to over-triaging and increased resource use. Limitations included a small sample size and the use of a general AI model not specifically trained for medical triage. Future research should focus on AI models tailored for ED triage to improve reliability and clinical applicability.