Abstract
Colorectal cancer (CRC) remains a leading cause of cancer mortality globally, yet current histopathological diagnostics capture only limited features. This study aimed to discover subtle, prognostically significant histomorphological patterns in CRC tissues using unsupervised deep learning. We developed a framework integrating convolutional neural networks with deep clustering, trained on 23,341 image patches from 493 patients. We identified 30 distinct histomorphological clusters from CRC tissue images. Through univariate and multivariate survival analyses, three clusters (Cluster13, Cluster19, and Cluster24) were consistently associated with patient prognosis. These clusters were integrated with clinical factors (T stage, N stage, and differentiation degree) to construct a prognostic risk model. Patients stratified into high-risk and low-risk groups based on model predictions showed significant survival differences in both the training set (N = 493) and an independent validation set (N = 2590). Furthermore, logistic regression and multivariate Cox analyses demonstrated that incorporating the three histomorphological clusters alongside clinical factors yielded a modest but statistically significant improvement in predictive performance compared to clinical factors alone, indicating their complementary value for prognosis. This work demonstrates that computational pathology can uncover novel, visually elusive morphological features with independent prognostic value, offering potential to refine CRC patient stratification and inform clinical decision-making.