Abstract
BACKGROUND AND OBJECTIVE: Recently, there has been a growing interest in the use of deep learning methods within the multi-modal domain of breast cancer research. Integrating multi-modal data for breast cancer prediction can generate richer and more diverse set of information, leading to a greater robustness in prediction outcomes as compared to single-modal approaches. This review comprehensively summarizes the advancements in multi-modal breast cancer research over the past 5 years and critically assesses the related opportunities and challenges, serving as a valuable reference for future studies. The application of deep learning techniques to the processing of multi-modal breast cancer data is discussed in depth, and the latest strategies and potential future directions in this area are examined. METHODS: A systematic analysis of studies on deep learning methods for breast cancer diagnosis based on multi-modal data was conducted. A comprehensive literature search was performed across PubMed, Web of Science, Cochrane Library, and Google Scholar for studies published between January 2019 and April 2025. To ensure the representativeness of the included research, studies were evaluated according to three aspects: types of multi-modal data used, the fusion strategies adopted, and their clinical relevance. KEY CONTENT AND FINDINGS: This review systematically traces the development of deep learning approaches for multi-modal breast cancer data, from foundational to more advanced methodologies. First, the paper categorizes common data types and core tasks related to breast cancer prediction. Subsequently, it classifies multi-modal data fusion strategies into three types-feature-level fusion, decision-level fusion, and hybrid fusion-providing a detailed explanation of the prediction steps for each category and comparing their effectiveness. Finally, the common challenges in multi-modal breast cancer research and insights into potential directions for future research are identified and discussed. CONCLUSIONS: At present, although numerous deep learning-based multi-modal studies on breast cancer have been proposed, multi-modal fusion remains in the exploratory stage. Future research should focus on addressing the scarcity of high-quality public datasets, as well as developing more robust network architectures and adaptive fusion strategies to better capture complementary information across modalities.