Abstract
Conformational dynamics are often critical for protein functions. There is strong interest in deep learning models to predict the conformational distributions of proteins or design protein structures that can host rich conformational dynamics. Here we report PVQD (Protein Vector Quantization and Diffusion), a method using a vector-quantized auto-encoder to learn protein backbone latent representations and latent-space diffusion for backbone generation and for conformation sampling conditioned on native sequences. Comparisons show PVQD generates backbones with natural-like compositions of secondary structures, loop lengths, and domain sizes. In sampling conformations of natural proteins, PVQD better reproduces experimental structural variations in benchmark proteins than existing approaches. For K-Ras, KaiB, 4.1G CTD, and D-allose binding proteins, PVQD captures sequence-dependent effects on functional conformational dynamics. Thus, the latent space diffusion approach forms a valuable framework that can unify the prediction and design of protein structures with the capability of modeling conformational dynamics.