Classification of HIV-1 Protease Inhibitors by Machine Learning Methods

利用机器学习方法对HIV-1蛋白酶抑制剂进行分类

阅读:1

Abstract

HIV-1 protease plays an important role in the processing of virus infection. Protease is an effective therapeutic target for the treatment of HIV-1. Our data set is based on a selection of 4855 HIV-1 protease inhibitors (PIs) from ChEMBL. A series of 15 classification models for predicting the active inhibitors were built by machine learning methods, including k-nearest neighors (K-NN), decision tree (DT), random forest (RF), support vector machine (SVM), and deep neural network (DNN). The molecular structures were characterized by (1) fingerprint descriptors including MACCS fingerprints and PubChem fingerprints and (2) physicochemical descriptors calculated by CORINA Symphony. The prediction accuracies of all of the models are more than 70% on the test set; the best accuracy of 83.07% was obtained by model 4A, which was built by the SVM method based on MACCS fingerprint descriptors. Nine consensus models were built with three kinds of different descriptors, which combined all of the machine learning methods using the "consensus prediction". Model C3(a) developed with MACCS fingerprint descriptors showed the highest accuracy on both training set (91.96%) and test set (83.15%). An external validation set including 35 989 compounds from DUD database and 239 active inhibitors from the recent literature was used to verify the performance of our model. The best prediction accuracy of 98.37% was obtained by model 3C, which was built by RF based on CORINA Symphony descriptors. In addition, from the analysis of molecular descriptors, it shows that the aromatic system and atoms related to hydrogen bonding provide important contributions to the bioactivity of PIs.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。