Molecular Descriptors, Structure Generation, and Inverse QSAR/QSPR Based on SELFIES

基于SELFIES的分子描述符、结构生成和反向QSAR/QSPR

阅读:1

Abstract

For inverse QSAR/QSPR in conventional molecular design, several chemical structures must be generated and their molecular descriptors must be calculated. However, there is no one-to-one correspondence between the generated chemical structures and molecular descriptors. In this paper, molecular descriptors, structure generation, and inverse QSAR/QSPR based on self-referencing embedded strings (SELFIES), a 100% robust molecular string representation, are proposed. A one-hot vector is converted from SELFIES to SELFIES descriptors x, and an inverse analysis of the QSAR/QSPR model y = f(x) with the objective variable y and molecular descriptor x is conducted. Thus, x values that achieve a target y value are obtained. Based on these values, SELFIES strings or molecules are generated, meaning that inverse QSAR/QSPR is performed successfully. The SELFIES descriptors and SELFIES-based structure generation are verified using datasets of actual compounds. The successful construction of SELFIES-descriptor-based QSAR/QSPR models with predictive abilities comparable to those of models based on other fingerprints is confirmed. A large number of molecules with one-to-one relationships with the values of the SELFIES descriptors are generated. Furthermore, as a case study of inverse QSAR/QSPR, molecules with target y values are generated successfully. The Python code for the proposed method is available at https://github.com/hkaneko1985/dcekit.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。