Abstract
Generative protein modeling provides advanced tools for designing diverse protein sequences and structures. However, accurately modeling the conformational landscape and designing sequences remain critical challenges: ensuring that the designed sequence reliably folds into the target structure as its most stable conformation, and optimizing the sequence for a given suboptimal fixed input structure. In this study, we present a systematic analysis of jointly optimizing sequence-to-structure and structure-to-sequence mappings. This approach enables us to find optimal solutions for modeling the conformational landscape. We validate our approach with large-scale protein stability measurements, demonstrating that joint optimization is superior for designing stable proteins using a joint model (TrRosetta and TrMRF) and for achieving high accuracy in stability prediction when jointly modeling (half-masked ESMFold pLDDT + ESM2 Pseudo-likelihood). We further investigate features of sequences generated from the joint model and find that they exhibit higher frequencies of hydrophilic interactions, which may help maintain both secondary structure registry and pairing-features not captured by structure-to-sequence modeling alone.