Abstract
BACKGROUND: Deep learning has advanced breast tumor prediction research, but traditional single-modality models limit feature diversity and accuracy. PURPOSE: To develop and validate a multimodal deep learning approach that combines mammography and ultrasound imaging for improved breast tumor classification and enhanced clinical decision-making. METHODS: This retrospective study analyzed 663 female patients with breast lesions from 2018 to 2021, including 384 benign and 279 malignant cases. The two-stage prediction model employed improved modality-specific attention mechanisms: efficient channel attention (ECA-Net) for ultrasound and convolutional block attention module (CBAM) for mammography. The fused features were input into a stacking ensemble module with logistic regression (LR), support vector machine (SVM), random forest (RF), and Extra-Trees (ET) as base learners, and multilayer perceptron (MLP) neural network as meta-learner. Data was divided into training (464), validation (133), and test (66) sets with a 7:2:1 ratio. RESULTS: The proposed multimodal prediction model-mammography ultrasound (MPM-MU) achieved superior performance with an area under the receiver operating characteristic (ROC) Curve (AUC) of 87.9 ± 0.21%, representing improvements of 13.4% and 15.6% over attention-enhanced mammography (74.5%) and ultrasound (72.3%) models, respectively. Ablation studies confirmed the effectiveness of both multimodal feature fusion and attention mechanisms in enhancing diagnostic performance. CONCLUSIONS: The multimodal prediction model-mammography ultrasound (MPM-MU) with modality-specific attention mechanisms demonstrated superior performance in distinguishing between benign and malignant breast tumors compared to single-modality approaches. This approach assists radiologists in improving breast lesion classification accuracy and enhancing clinical decision-making, potentially reducing unnecessary biopsies and improving diagnostic consistency.