Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study

ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的评价:一项观察性研究的内容分析

阅读:1

Abstract

BACKGROUND: The integration of artificial intelligence (AI) into clinical workflows holds promise for enhancing outpatient decision-making and patient education. ChatGPT, a large language model developed by OpenAI, has gained attention for its potential to support both clinicians and patients. However, its performance in the outpatient setting of general surgery remains underexplored. OBJECTIVE: This study aimed to evaluate whether ChatGPT-4 can function as a virtual outpatient assistant in the management of puerperal mastitis by assessing the accuracy, clarity, and clinical safety of its responses to frequently asked patient questions in Turkish. METHODS: Fifteen questions about puerperal mastitis were sourced from public health care websites and online forums. These questions were categorized into general information (n=2), symptoms and diagnosis (n=6), treatment (n=2), and prognosis (n=5). Each question was entered into ChatGPT-4 (September 3, 2024), and a single Turkish-language response was obtained. The responses were evaluated by a panel consisting of 3 board-certified general surgeons and 2 general surgery residents, using five criteria: sufficient length, patient-understandable language, accuracy, adherence to current guidelines, and patient safety. Quantitative metrics included the DISCERN score, Flesch-Kincaid readability score, and inter-rater reliability assessed using the intraclass correlation coefficient (ICC). RESULTS: A total of 15 questions were evaluated. ChatGPT's responses were rated as "excellent" overall by the evaluators, with higher scores observed for treatment- and prognosis-related questions. A statistically significant difference was found in DISCERN scores across question types (P=.01), with treatment and prognosis questions receiving higher ratings. In contrast, no significant differences were detected in evaluator-based ratings (sufficient length, understandability, accuracy, guideline compliance, and patient safety), JAMA benchmark scores, or Flesch-Kincaid readability levels (P>.05 for all). Interrater agreement was good across all evaluation parameters (ICC=0.772); however, agreement varied when assessed by individual criteria. Correlation analyses revealed no significant overall associations between subjective ratings and objective quality measures, although a strong positive correlation between literature compliance and patient safety was identified for one question (r=0.968, P<.001). CONCLUSIONS: ChatGPT demonstrated adequate capability in providing information on puerperal mastitis, particularly for treatment and prognosis. However, evaluator variability and the subjective nature of assessments highlight the need for further optimization of AI tools. Future research should emphasize iterative questioning and dynamic updates to AI knowledge bases to enhance reliability and accessibility.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。