Hybridization of SMILES and chemical-environment-aware tokens to improve performance of molecular structure generation

将SMILES表达式与化学环境感知标记相结合,以提高分子结构生成性能

阅读:1

Abstract

The Simplified Molecular Input Line Entry System (SMILES) is one of the most widely adopted molecular representations. However, SMILES notation suffers from limited token diversity and a lack of chemical information within individual tokens. To address these limitations while maintaining its simplicity, we propose a molecular representation method through the hybridization of standard SMILES tokens with Atom-In-SMILES (AIS) tokens, which incorporate local chemical environment information into a single token. This hybrid representation, termed SMI + AIS, combines SMILES and AIS tokens, allowing AIS tokens to differentiate chemical elements based on their chemical context without introducing additional tokens for less frequent elements. Using the SMI + AIS representation, we evaluated its performance by comparing the predefined metric of generated structures in chemical structure generation based on latent space optimization. Compared to standard SMILES, SMI + AIS achieved a 7% improvement in binding affinity and a 6% increase in synthesizability, highlighting its utility in the enhancement of machine learning-based molecular design. Our results demonstrate that the SMI + AIS representation provides a more effective and informative approach to encapsulate chemical context and presents potential for performance enhancement in other machine learning tasks in chemistry.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。