Abstract
BACKGROUND: This study aims to construct a dual-modal machine learning model that integrates ultrasound radiomics and plasma proteomics for the precise diagnosis of breast cancer. METHODS: Using a multi-source data integration strategy, 10 protein markers and 14 ultrasound radiomics features were screened from the TCGA, CPTAC databases, and the clinical cohort (including 60 healthy controls, 60 cases of benign breast diseases, and 60 cases of breast cancer) based on plasma protein mass spectrometry and ultrasound data. A dual-modal diagnostic model was constructed in combination with machine learning algorithms. RESULTS: The results showed that the protein marker detection model performed outstandingly in the primary screening of healthy people and breast diseases (with the highest AUC of 0.974). Still, its diagnostic performance was limited in differentiating benign and malignant diseases (AUC<0.8 under multiple algorithms). The bimodal model demonstrated excellent performance (AUC = 0.938) in differentiating benign and malignant lesions, significantly outperforming the single proteomics model (AUC = 0.830) and the radiomics model (AUC = 0.841). CONCLUSION: This study confirmed for the synergistic diagnostic value of plasma proteins and ultrasound images, providing a new strategy with both accuracy and accessibility for stratified diagnosis of breast cancer.