Abstract
BACKGROUND: The blood-brain barrier (B(3)) acts as a membrane that is a major concern in treating central nervous system (CNS) disorders. The B(3) penetrating peptides (B(3)PPs) play a significant role in delivering therapeutic drugs to a wide range of disorder diseases such as multiple sclerosis, Parkinson's disease, and Alzheimer's disease. Therefore, the correct identification of drug agents is important for the disease treatment. Generally, the computational methods are more cost effective and faster than conventional wet-lab methods in predicting B(3)PPs. Consequently, we have developed a novel deep learning-based predictor called DeepB(3)Pred that accurately predicts the B(3)PPs and non-B(3)PPs from sequence data. RESULTS: In the proposed method, we used three types of novel features namely Pseduo residue energy content matric (PseRECM), graphical and statistical-based feature engineering (GSFE), and composition-transition and distribution (CTD)-based features. These features capture the energy-, graphical-, and compositional-based properties of from the primary peptide sequences. The data skewness is recognized as an inevitable issue that was tackled by employing a random under sampling technique. The extracted data were fed into various deep learning, i.e., stacked bidirectional gated recurrent unit (BiGRU), Deep Forest, and machine learning models, i.e., CatBoost, Support Vector Machine. BiGRU-based DeepB(3)Pred model attained better results than the other state-of-the-art B(3)PPs predictors. The prediction efficacy of the proposed model on fivefold cross-validation in terms of accuracy is 0.945, MCC of 0.877, and area under the curve (AUC) of 0.965. The generalization performance on the unseen data is reported as 0.869 for accuracy, 0.635 for MCC, and 0.933 for AUC. CONCLUSION: We believe our research will accelerate the peptide-based drug discovery for neurological diseases in particular.