Abstract
Background/Objectives: To develop an automated deep learning-based bone age prediction model using the Tanner-Whitehouse (TW3) method and evaluate its feasibility by comparing its performance with that of pediatric radiologists. Methods: The hand and wrist radiographs of 560 Korean children and adolescents (280 female, 280 male, mean age 9.43 ± 2.92 years) were evaluated using the TW3-based model and three pediatric radiologists. Images with bony destruction, congenital anomalies, or non-diagnostic quality were excluded. A commercialized AI solution built upon the Rotated Single Shot MultiBox Detector (SSD) and EfficientNet-B0 was used. Bone age measurements from the model and radiologists were compared using the paired t-tests. Linear regression analysis was performed and the coefficient of determination (r²), mean absolute error (MAE), and root mean square error (RMSE) were measured. A Bland-Altman analysis was conducted and the proportion of bone age predictions within 0.6 years of the radiologists' assessments was calculated. Results: The TW3-based model demonstrated no significant differences between bone age measurements and radiologists, except for participants <6 and >13 years old (overall, p = 0.874; 6-8 years, p = 0.737; 8-9 years, p = 0.093; 9-10 years, p = 0.301; 10-11 years, p = 0.584; 11-13 years, p = 0.976; <6 or >13 years, p < 0.001). There was a strong linear correlation between the model prediction and radiologist assessments (r(2) = 0.977). The RMSE and MAE values of the model were 0.529 (95% CI, 0.482-0.575) and 0.388 (95% CI, 0.361-0.417) years. Overall, 82.3% of bone age model predictions were within 0.6 years of the radiologists' interpretation. Conclusions: Automated deep learning-based bone age assessment has the potential to reduce radiologists' workload and provide standardized measurements for clinical decision making.