Abstract
BACKGROUND: Varying patterns of angiogenesis are observed across molecular subtypes of breast cancer (BC). This study aimed to develop and validate machine learning (ML) models for identifying molecular subtypes of BC using contrast-enhanced ultrasound (CEUS) and superb microvascular imaging (SMI). METHODS: In this prospective study, 191 BC patients with 193 lesions were enrolled. Clinical data, CEUS parameters, and SMI features were collected; recursive feature elimination was applied for feature selection. Random forest (RF), support vector machine (SVM), and logistic regression (LR) were trained to distinguish molecular subtypes, and their diagnostic performances were compared. Model interpretability was achieved using SHapley Additive exPlanations (SHAP). RESULTS: BC lesions were randomly assigned to training (n=135) and test (n=58) cohorts in a 7:3 ratio. Fivefold cross-validation with five repetitions was utilized for hyperparameter tuning. SVM effectively distinguished luminal subtypes, achieving area under the curves (AUCs) of 0.955 [95% confidence interval (CI): 0.914-0.996] for training and 0.874 (95% CI: 0.769-0.979) for testing. RF outperformed other models for human epidermal growth factor receptor 2 (HER2)-overexpressed subtype, with AUC of 0.944 (95% CI: 0.902-0.986) and 0.872 (95% CI: 0.768-0.975) in training and test cohorts, respectively. LR excelled in differentiating triple-negative breast cancer (TNBC), yielding AUC of 0.846 (95% CI: 0.758-0.933) and 0.824 (95% CI: 0.704-0.943). CONCLUSIONS: Incorporating CEUS and SMI features into an ML approach may enhance the diagnostic capacity for distinguishing molecular subtypes of BC.