Abstract
Colorectal cancer (CRC) is a leading cause of cancer-related mortality worldwide, largely due to the development of distant metastases, which is associated with poor survival outcomes. Early detection and accurate prediction of colorectal cancer metastasis can significantly improve patient outcomes. Conventional diagnostic approaches, including imaging and biomarkers, are often limited by suboptimal sensitivity and inter-observer variability. In recent years, machine learning (ML) and deep learning (DL) models have emerged as powerful tools capable of analyzing complex, high-dimensional clinical, imaging, and molecular data to enhance metastasis prediction. This review provides a comprehensive overview of ML and DL approaches for the early prediction of CRC metastasis, highlighting gaps in comparative studies. We explore DL techniques, including convolutional neural networks (CNNs), and alternative approaches, along with their architectures and layer types. The most commonly used CNN models such as GoogleNet, VGGNet, ResNet, and U-Net have demonstrated effectiveness in identifying complex patterns of data. These predictive models have improved individualized treatment strategies, leading to enhanced patient outcomes owing to the integration of multi-modal data including imaging, clinical data, histological data and employing transfer learning. This review also examines the applications of ML and DL in predicting CRC metastasis to specific sites such as lymph nodes, liver, lungs, bones, and peritoneum. While traditional ML algorithms, including logistic regression and random forests remain valuable, DL models incorporating radiomics and transfer learning, often achieve superior performance. Finally, we explore how the computational costs and resource implications of ML and DL technologies need attention in clinical contexts. Challenges such as the availability of high-quality datasets, interpretability, and ethical concerns are examined, and appropriate solutions are discussed. Future research should focus on developing explainable ML/DL models, optimizing computational resources, establishing ethical frameworks, validation of model performance across diverse populations, along with the incorporation of molecular and genomic data.