Abstract
BACKGROUND: The progression of cancer is driven by the accumulation of mutations in driver genes. Many researches promote to identify cancer driver genes. However, most of them ignore the high-order features in the network. RESULT: In this study, we propose a novel method MLGCN-Driver based on multi-layer graph convolutional neural networks (GCN) to boost driver gene identification. MLGCN-Driver employs multi-layer GCN with initial residual connections and identity mappings to learn biological multi-omics features within biological networks. In addition, node2vec algorithm is used to extract the topological structure features of the biological network, and then the features are fed into another multi-layer GCN for feature learning. Meanwhile, the initial residual connections and identity mappings mitigate the over-smooth of features. Finally, the probability of each gene being a driver gene is calculated based on low-dimensional biological features and topological features. CONCLUSION: We applied the MLGCN-Driver on pan-cancer dataset and cancer type-specific datasets. Experimental results demonstrate the excellent performance of MLGCN-Driver in terms of the area under the ROC curve (AUC) and the area under the precision-recall curve (AUPRC) when compared with state-of-the-art approaches.