Abstract
Synthetic polymeric materials underpin fundamental technologies in the energy, electronics, consumer goods, and medical sectors, yet their development still suffers from prolonged design timelines. Although polymer informatics tools have supported speedup, polymer simulation protocols continue to face significant challenges in the on-demand generation of realistic 3D atomic structures that respect the conformational diversity of polymers. Generative algorithms for 3D structures of inorganic crystals, biopolymers, and small molecules exist, but have not addressed synthetic polymers because of challenges in representation and data set constraints. In this work, we introduce polyGen, a generative model designed specifically for 3D polymer structures that operates from minimal inputs such as the repeat unit chemistry alone. polyGen combines graph-based encodings with a latent diffusion transformer using positional biased attention for realistic conformation generation. Given the limited data set of 3,855 DFT-optimized polymer structures, we incorporate joint training with small molecule data to enhance generation quality. We also establish structure matching criteria to benchmark our approach on this novel problem. polyGen overcomes the limitations of traditional crystal structure prediction methods for polymers, successfully generating realistic and diverse linear and branched conformations, with promising performance even on challenging large repeat units. As an atomic-level proof-of-concept capturing intrinsic polymer flexibility, it marks a transformative capability in material structure generation.