Abstract
BACKGROUND: Given the role of DNA replication in tumorigenesis, identifying diagnostic and prognostic biomarkers of this process using machine learning approaches may reveal new therapeutic targets and improve prognostic assessment in breast cancer. METHODS: Differentially expressed DNA replication genes in breast cancer were identified from two independent datasets. SVM-RFE was applied to distinguish the most informative diagnostic genes in breast cancer. The prognostic gene signature was constructed with LASSO Cox regression. ROC and KM analyses were performed to assess the gene signature. Independence of the gene signature was evaluated by univariate and multivariate Cox regression. The prognostic value of the gene signature was assessed in clinical subgroups. WGCNA was conducted to identify the genes co-expressed with the signature genes, followed by GO BP and KEGG enrichment analysis. RESULTS: The AUCs showed the strong performance of the SVM-RFE in training and external validation sets. The genes with highest SVM-RFE importance score have potential as diagnostic biomarker candidates. Prognostic DNA replication-related gene signature consisted of four genes and patients in high-risk group showed poor overall survival. The gene signature showed moderate discrimination based on AUC values and was found to be an independent prognostic factor. Co-expressed genes identified by WGCNA were enriched for cell cycle, chromosome segregation, and DNA replication and repair terms. CONCLUSION: SVM-RFE proved to be a valuable machine-learning method to detect diagnostic genes and a novel prognostic DNA replication-related gene signature was proposed to predict overall survival in breast cancer.