Abstract
BACKGROUND: An analysis of colorectal cancer epithelial cells showed two intrinsic subtypes called iCMS2 and iCMS3, in addition to the bulk consensus subtypes (CMS1-4). The intrinsic subtypes can be prognostically important and may prove predictive of response to treatment. We present a method for calculating the iCMS subtypes that is robust to technical variation, is designed for single-sample applications and is highly prognostic in unseen data. RESULTS: A single-sample classifier (SSC) was developed based on non-parametric correlation similarity with gene expression centroids, synthetically created by resampling samples with known iCMS classes from public datasets that have been used in the derivation of the iCMS classification. We selected the subset of iCMS genes (N = 201) with the strongest epithelial expression in colorectal cancer, aiming to reduce unrelated, non-epithelial variation. The SSC calculates the most likely iCMS class based on the distribution of the classes of the nearest centroids with either an absolute cutoff or K-nearest-neighbors voting. In the unseen GTR cohort, SSC nearest-class accuracy was 88% vs the previously published NTP predictor, which reached 75.4% without correction. Similarly, nearest-class accuracy was 90% in the public E-MTAB-12862. In extended tests simulating various perturbations, calls remained stable with extensive noise, partial gene loss, low purity, and synthetic iCMS2/3 admixtures. In addition, the SSC was applied to data from the VELOUR trial, for which reference iCMS calls were also available. The SSC iCMS was prognostic for OS in iCMS2 vs iCMS3 (p < 0.00001) and performed at least as well as the reference iCMS, which was also prognostic (p = 0.0001). In the E-MTAB-12862 data, the SSC was prognostic in metastatic patients (N = 114), and in a multivariable model including stage, grade, age at diagnosis (all of which were prognostic) and CMS across the full cohort (N = 1062). The previously published NTP was not prognostic in this cohort. CONCLUSIONS: The iCMS-SSC enables robust, single-sample iCMS calling without batch correction, improves resilience to technical/biological perturbations, and retains prognostic signal in clinical-trial data. Its epithelial gene focus and multi-centroid, rank-based design support deployment for screening, stratification, and retrospective biomarker analyses. Computation is significantly faster than NTP and parallel-ready. An open-source R implementation is provided. (https://github.com/CRCrepository/iCMS.SSC). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12967-025-07363-9.