Abstract
Macrocycles are a promising therapeutic class. The incorporation of heterochiral and non-natural chemical building-blocks presents challenges for rational design, however. With no existing machine learning methods tailored for heterochiral macrocycle design, we developed a novel convolutional autoencoder model to rapidly generate energetically favorable macrocycle backbones for heterochiral design and structure prediction. Our approach surpasses the current state-of-the-art method, Generalized Kinematic loop closure (GenKIC) in the Rosetta software suite. Given the absence of large, available macrocycle datasets, we created a custom dataset in-house and in silico. Our model, CyclicCAE, produces energetically stable backbones and designable structures more rapidly than GenKIC. It enables users to perform energy minimization, generate structurally similar or diverse inputs via MCMC, and conduct inpainting with fixed anchors or motifs. We propose that this novel method will accelerate the development of stable macrocycles, speeding up macrocycle drug design pipelines.