Abstract
Precision oncology relies on predictive biomarkers for selecting targeted cancer therapies. Network-based properties of proteins, together with structural features such as intrinsic disorder, are likely to shape their potential as biomarkers. We therefore designed a hypothesis-generating framework that integrates network motifs and protein disorder to explore their contribution to predictive biomarker discovery. This encouraged us to develop MarkerPredict by using literature evidence-based positive and negative training sets of 880 target-interacting protein pairs total with Random Forest and XGBoost machine learning models on three signalling networks. MarkerPredict classified 3670 target-neighbour pairs with 32 different models achieving a 0.7-0.96 LOOCV accuracy. We defined a Biomarker Probability Score (BPS) as a normalised summative rank of the models. The scores identified 2084 potential predictive biomarkers to targeted cancer therapeutics, 426 was classified as a biomarker by all 4 calculations. We detailed the biomarker potential of LCK and ERK1. This study encourages further validation of the high-ranked predictive biomarkers. The development of the MarkerPredict tool (which is available on GitHub) for predictive biomarker identification may have a significant impact on clinical decision-making in oncology.