Abstract
INTRODUCTION: We present the state of the art of ultrasound-based machine learning (ML) radiomics models in the context of ovarian masses and analyze their accuracy in differentiating between benign and malignant adnexal masses. MATERIAL AND METHODS: Web of Science, PubMed, and Scopus databases were searched. All studies were imported into RAYYAN QCRI software. All studies that developed and internally or externally validated ML models using only radiomics features extracted from ultrasound images were included. The overall quality of the included studies was assessed using the QUADAS-AI tool. Summary sensitivity and specificity analyses with corresponding 95% confidence intervals (CIs) were reported. RESULTS: 12 studies developed ML models including only radiomics features extracted from ultrasound images, and six of them were included in the meta-analysis. The overall sensitivity and specificity for differentiating benign from malignant adnexal masses were 0.80 (95% CI 0.74-0.87) and 0.86 (95% CI 0.80-0.90), respectively, in the validation set. All studies demonstrated a high risk of bias in subject selection (e.g., lack of details on image sources or scanner models; absence of image preprocessing), and the majority also showed a high risk in the index test (e.g., models were not validated on external datasets) domain. In contrast, the risk of bias was generally low for the reference standard (i.e., most studies used a reference that accurately identified the target condition) and the testing workflow (i.e., the time interval between the index test and reference standard was appropriate) domains. CONCLUSIONS: The good performance of ultrasound-based radiomics models in the validation set supports that radiomics is worth exploring to improve the diagnosis of adnexal masses. So far, the studies have a high risk of bias due to the small sample size, single-setting design, and no external validation included.