Abstract
MOTIVATION: Molecule representation learning (MRL) translates molecules into a real vector space, serving as input to downstream tasks in biology, chemistry, and computer science. This article introduces a chemical synthesis graph learning (CSGL) framework, which enhances MRL by considering both the atomic structures of molecules and their roles in chemical reactions through a hierarchical graph representation. Specifically, molecules are first modeled based on their molecular graphs, which capture atomic-level structural information. They are then further refined using a chemical synthesis graph, where nodes represent reactant and product molecule sets, and edges encode chemical transformations between reactants and products (e.g. changes in molecular structures). CSGL optimizes molecular embeddings of reactant and product nodes in a fashion that ensures the embeddings conform to a chemical balance constraint. RESULTS: Experimental results show that our method CSGL achieves strong performance on a variety of tasks, including product prediction, reaction classification, and molecular property prediction. AVAILABILITY AND IMPLEMENTATION: https://github.com/li-2023/CSGL.