Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors

当前深度学习模型在预测激酶抑制剂结合亲和力方面泛化能力较差

阅读:1

Abstract

The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors' SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models' performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。