Can AI-Based ChatGPT Models Accurately Analyze Hand-Wrist Radiographs? A Comparative Study

基于人工智能的 ChatGPT 模型能否准确分析手腕 X 光片?一项对比研究

阅读:1

Abstract

Background/Aims: The aim of this study was to evaluate the effectiveness of large language model (LLM)-based chatbot systems in predicting bone age and identifying growth stages, and to explore their potential as practical, infrastructure-independent alternatives to conventional methods and convolutional neural network (CNN)-based deep learning models. Methods: This study evaluated the performance of three ChatGPT-based models (GPT-4o, GPT-o4-mini-high, and GPT-o1-pro) in predicting bone age and growth stage using 90 anonymized hand-wrist radiographs (30 from each growth stage-pre-peak, peak, and post-peak-with equal male and female distribution). Reference standards were ensured by expert orthodontists using Fishman's Skeletal Maturity Indicators (SMI) system and the Greulich-Pyle Atlas, with each radiograph analyzed by three GPT models using standardized prompts. Model performances were evaluated through statistical analyses assessing agreement and prediction accuracy. Results: All models showed significant agreement with the reference values in bone age prediction (p < 0.001), with GPT-o1-pro having the highest concordance (Pearson r = 0.546). No statistically significant difference was observed in the mean absolute error (MAE) among the models (p > 0.05). The GPT-o4-mini-high model achieved an accuracy rate of 72.2% within a ±2 year deviation range for bone age prediction. The GPT-o1-pro and GPT-o4-mini-high models showed bias in the Bland-Altman analysis of bone age predictions; however, GPT-o1-pro yielded more reliable predictions with narrower limits of agreement. In terms of growth stage classification, the GPT-4o model achieved the highest agreement with the reference values (κ = 0.283, p < 0.001). Conclusions: This study shows that general-purpose GPT models can support bone age and growth stages prediction, with each model having distinct strengths. While GPT models do not replace clinical examination, their contextual reasoning and ability to perform preliminary assessments without domain-specific training make them promising tools, though further development is needed.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。