An assessment of ChatGPT in error detection for thyroid ultrasound reports: A comparative study with ultrasound physicians

ChatGPT在甲状腺超声报告错误检测中的应用评估:与超声医师的比较研究

阅读:1

Abstract

BACKGROUND: This study evaluates the performance of GPT-4o in detecting errors in ACR TIRADS ultrasound reports and its potential to reduce report generation time. METHODS: A retrospective analysis of 200 thyroid ultrasound reports from the Second Affiliated Hospital of Fujian Medical University was conducted, with reports categorized as correct or containing up to three errors. GPT-4o's performance was compared with ultrasound physicians of varying experience levels in error detection and processing time. RESULTS: GPT-4o detected 90.0% (180/200) of errors, slightly less than the best-performing senior ultrasound physician's 93.0% (186/200) with no significant difference (p = 0.281). GPT-4o's error detection rate was comparable to that of ultrasound physicians overall (p = 0.098 to 0.866). It outperformed Resident 2 in diagnostic errors (87% vs. 69%). Reader agreement was low (Cohen's kappa = 0 to 0.31). GPT-4o reviewed reports significantly faster than all ultrasound physicians (0.79 vs. 1.8 to 3.1 h, p < 0.001), making it a reliable and efficient tool for error detection in medical imaging. CONCLUSIONS: GPT-4o is comparable to experienced ultrasound physicians in error detection and significantly improves report processing efficiency, offering a valuable tool for enhancing diagnostic accuracy and aiding junior residents.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。