Abstract
Messenger RNA (mRNA) stool based biomarkers represent a promising approach for the diagnosis of colorectal cancer (CRC) and advanced adenoma (AA). But it is unclear which mRNA biomarkers have the most clinical utility. This study aims to partially fill this gap by performing an analysis which first ranks genes based on their expression profile in publicly available RNA-seq tissue datasets. Each gene was ranked based on observed differential expression across the majority of tumors as well as the level of expression in tumor tissue. Those genes with strong differential expression across the majority of tumors that were also highly expressed would have a higher ranking. The top 20 genes as ranked in the bioinformatic analysis of tumor and normal colon tissue gene expression were then tested on 114 clinical stool samples (CRC N = 33, AA N = 28, Controls N = 53). Fourteen of the genes had significant differential expression in the stool of CRC patients compared to controls (false discovery rate or FDR < 0.05). The Pearson correlation coefficient between tissue and stool expression was 0.57 (p-value = 0.007). The combined performance of the 20 genes in clinical stool samples had an area under the receiver operator curve (AUC) of 0.94 for CRC detection (sensitivity 75.5%, specificity 95%) and an AUC of 0.83 (sensitivity 55.8%, specificity 92.6%) for AA detection. The ability to use existing public transcriptomic datasets to identify promising candidate genes can substantially reduce the cost and effort required to screen for clinically useful mRNA biomarkers.
