Clinical Performance Tradeoffs of ChatGPT-5.2 Thinking (OpenAI) Compared with Radiologist Interpretation in Biopsy-Referred Mammography: Cancer Detection, False Positives, and Laterality

ChatGPT-5.2 思维(OpenAI)与放射科医生解读在活检转诊乳腺X线摄影中的临床性能权衡:癌症检出率、假阳性率和侧别

阅读:1

Abstract

Background/Objectives: Breast cancer screening such as mammography supports earlier detection, but variability in interpretation can still lead to missed cancers and avoidable follow-up testing. We evaluated ChatGPT-5.2 Thinking (OpenAI) as a stand-alone model for examination-level malignancy classification on standard bilateral mammography views in a biopsy-referred cohort, compared with breast radiologists, and assessed laterality performance. Methods: We conducted a retrospective, multicenter diagnostic-accuracy study across breast imaging centers in Saudi Arabia. From an upstream screened cohort (n = 1225), we constructed a biopsy-referred test set of 100 mammography examinations (four 2D views per exam: bilateral CC and MLO; 400 images), including 61 biopsy-confirmed malignancies and 39 biopsy-negative controls, with pathology as the reference standard. Radiologists were blinded to pathology and AI outputs and assigned BI-RADS (0-5) and suspected laterality. ChatGPT-5.2 interpreted the same de-identified views using a BI-RADS-guided prompt to generate BI-RADS and laterality. The sensitivity, specificity, accuracy, and laterality concordance were then estimated. Results: ChatGPT-5.2 had higher sensitivity than radiologists (95.08% vs. 81.97%) but markedly lower specificity (10.26% vs. 56.41%), resulting in lower overall accuracy (62.00% vs. 72.00%). The AI produced 58 true positives, 35 false positives, and 3 false negatives, while radiologists produced 50 true positives, 17 false positives, and 11 false negatives. Laterality accuracy among malignant examinations was 60.66%. Conclusions: In this pathology-anchored, biopsy-referred evaluation, ChatGPT-5.2 identified more cancers but generated substantially more false-positive classifications and showed only moderate breast-side localization. These findings support use as a concurrent aid or prioritization tool rather than a stand-alone reader and motivate efforts to improve specificity and laterality before prospective validation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。