Abstract
BACKGROUND: Early recurrence in colorectal cancer liver metastases (CRLM) typically correlates with significantly worse survival outcomes. There is a strong demand for developing robust and interpretable approaches to assist clinicians in identifying patients at high risk of recurrence. METHODS: In this study, we utilized clinical CT images and associated clinical data from 197 CRLM patients, provided as DICOM images. A total of 993 radiomic features, including shape, texture, and first-order characteristics, were extracted. Eight machine learning models were trained and validated: Random Forest (RF), Multilayer Perceptron (MLP), K-nearest Neighbors (KNN), Extremely Randomized Trees (ET), AdaBoost, Decision Trees (DT), Gradient Boosting (GB), and Linear Discriminant Analysis (LDA). RESULTS: In predicting tumor recurrence within one year, the ET model showed the best performance using only radiomic features, with an AUC of 0.9667. RF and GB also performed well, achieving AUCs of 0.9558 and 0.9227, respectively. When combining radiomic and clinical features, the performance of all models improved in terms of AUC. Specifically, the Random Forest (RF) model achieved the highest AUC of 0.9672, followed by Gradient Boosting (GB) with an AUC of 0.9646, and Extra Trees (ET) with an AUC of 0.9459. CONCLUSION: We developed a CT-based machine learning model, using the Random Forest algorithm, that combines clinical (e.g., age, carcinoembryonic antigen, bilobar disease) and radiomic features (e.g., selected features included texture-based metrics such as the 90th percentile of first-order statistics in the high-low-low (HLL) wavelet-decomposed image, and Run Entropy from the gray-level run length matrix (GLRLM) in the low-low-low (LLL) sub-band.) to predict early recurrence after hepatectomy in patients with colorectal liver metastasis (CRLM). This model has the potential to guide personalized postoperative surveillance. However, limitations such as the retrospective single-center design and relatively small sample size may affect the generalizability of the findings. Further validation in larger, multi-center cohorts is warranted to confirm its clinical utility.