Abstract
BACKGROUND: Fracture detecting and localizing in radiographic images is essential to enhance the effectiveness of the diagnosis of trauma and allow the image to be interpreted. Despite the potential potential of the deep learning in musculoskeletal imaging, the quality of classification results and the stability of localization are significant issues. PURPOSE: The purpose of the work is to design and test a deep learning system that fractures radiographic images and localizes them with the use of convolutional neural networks and object detection on the basis of YOLOv8. METHODS: A retrospective secondary data analysis was done based on publicly available, de-identified radiographic data. In fracture classification, three transfer-learning backbones were analyzed: ResNet18, MobileNetV3-Small, and EfficientNet-B0, which were trained on repeated stratified cross-validation with early stopping. The evaluation of model performance was with area under receiver operating characteristic curve (AUROC), average precision (AP), Brier score, accuracy, precision, recall, and specificity and F1-score. The temperature scaling was used to perform probability calibration and the nested threshold optimization to compare the performance at various operating points. To localise fractures, Precision, recall, mAP, 0.5, and mAP, 0.5:0.95 were used to compare and train YOLOv8n, YOLOv8s and YOLOv8m detectors on both validation and test sets. RESULTS: MobileNetV3-Small was the top-performing backbone in terms of overall performance, though the classification discrimination was generally low. Calibration analysis was used to show that probability distribution and reliability properties changed with the scaling of temperature and threshold optimization revealed significant differences in sensitivity, precision, specificity, and F1-score with different decision cutoffs. According to the localization experiment, YOLOv8 showed variability in the performance of the detector variants, with the largest test-set mAPs at 0.5 and the largest variation in classes across anatomical fracture types. These results show that the element of localization in the framework was better and more regular compared to the element of classification in the current experimental setup. CONCLUSION: The presented framework offers a combined method of fracture classification, calibration of probability, threshold analysis and radiographic localization. Although the classification aspect demonstrated poor discriminative accuracy, the localization outcomes using the YOLOv8 were relatively better in this scenario, which justifies the usefulness of detector-based fracture localization in this context. Clinical translation will be subject to further external validation, prospective assessment, and comparison of experts and readers.