Unraveling learning characteristics of transformer models for molecular design

揭示Transformer模型在分子设计中的学习特性

阅读:1

Abstract

In drug design, transformer networks adopted from natural language processing are applied in a variety of ways. We have used sequence-based generative compound design as a model system to explore the learning characteristics of transformers and determine if these models learned information relevant for protein-ligand interactions. The analysis reveals that sequence-based predictions of active compounds using transformer models required a proportion of at least ∼60% of the original test sequences. Moreover, predictions depended on sequence and compound similarity of training and test data and on compound memorization effects. The predictions were purely statistically driven by associating sequence patterns with molecular structures, thus rationalizing their strict dependence on detectable similarities. Moreover, the transformer models did not learn target sequence information relevant for ligand binding. While the results do not call sequence-based compound design approaches generally into question, they caution against over-interpretation of transformer models used for such applications.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。