Transformer-Decoder GPT Models for Generating Virtual Screening Libraries of HMG-Coenzyme A Reductase Inhibitors: Effects of Temperature, Prompt Length, and Transfer-Learning Strategies

用于生成HMG-辅酶A还原酶抑制剂虚拟筛选库的Transformer-Decoder GPT模型:温度、提示长度和迁移学习策略的影响

阅读:1

Abstract

Attention-based decoder models were used to generate libraries of novel inhibitors for the HMG-Coenzyme A reductase (HMGCR) enzyme. These deep neural network models were pretrained on previously synthesized drug-like molecules from the ZINC15 database to learn the syntax of SMILES strings and then fine-tuned with a set of ∼1000 molecules that inhibit HMGCR. The number of layers used for pretraining and fine-tuning was varied to find the optimal balance for robust library generation. Virtual screening libraries were also generated with different temperatures and numbers of input tokens (prompt length) to find the most desirable molecular properties. The resulting libraries were screened against several criteria, including IC50 values predicted by a dense neural network (DNN) trained on experimental HMGCR IC50 values, docking scores from AutoDock Vina (via Dockstring), a calculated quantitative estimate of druglikeness, and Tanimoto similarity to known HMGCR inhibitors. It was found that 50/50 or 25/75% pretrained/fine-tuned models with a nonzero temperature and shorter prompt lengths produced the most robust libraries, and the DNN-predicted IC50 values had good correlation with docking scores and statin similarity. 42% of generated molecules were classified as statin-like by k-means clustering, with the rosuvastatin-like group having the lowest IC50 values and lowest docking scores.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。