Abstract
PURPOSE: This study evaluated transfer learning in classifying skeletal metastases on bone scintigraphy. The primary objective was to assess its performance in detecting skeletal metastases, while the secondary objective was to compare its performance to human readers. METHODS: A total of 2,510 patients with known malignancies were included - 2,368 retrospectively recruited and 142 prospectively enrolled. Scans were categorized as normal, benign-degenerative, or metastasis based on clinical consensus, follow-up, biopsy, radiology, or SPECT/CT findings. The retrospective data were randomly divided into training (1,895) and validation (473) sets, while the prospective cohort served as an independent testing set. Google's InceptionV3 was used for image embedding, and 13 supervised ML algorithms were tested. The Log Loss value of a random classifier was used to select the optimal models for testing, while Stuart-Maxwell test compared models' performance to human readers. RESULTS: Eight ML models with Log Loss value less than that of the random classifier achieved AUCs > 0.900 on training and validation, with all but one (Support Vector Machine) maintaining AUCs > 0.900 on testing. Logistic Regression performed best (≥ 0.993 in all metrics), while Neural Networks, Gradient Boosting, and Random Forest also demonstrated robust performance (≥ 0.817 in all metrics). Notably, ML models interpreted 142 images in 0.027-1.770 s compared to 10.07-18.00 min by human readers; less experienced readers performed significantly worse (P ≤ 0.002) than the models, whereas experienced reader's performance was comparable (P ≥ 0.280). CONCLUSION: Transfer learning demonstrates commendable performance in classifying skeletal metastases on bone scintigraphy, outperforming less experienced readers while matching experienced reader's performance. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13139-025-00927-z.