Abstract
BACKGROUND: Association and cooperation among structural domains play an important role in protein function and drug design. Despite remarkable advancements in highly accurate single-domain protein structure prediction through the collaborative efforts of the community using deep learning, challenges still exist in predicting multi-domain protein structures when the evolutionary signal for a given domain pair is weak or the protein structure is large. RESULTS: To alleviate the above challenges, we proposed M-DeepAssembly, a protocol based on multi-objective protein conformation sampling algorithm for multi-domain protein structure prediction. Firstly, the inter-domain interactions and full-length sequence distance features are extracted through DeepAssembly and AlphaFold2, respectively. Secondly, subject to these features, we constructed a multi-objective energy model and designed a sampling algorithm for exploring and exploiting conformational space to generate ensembles. Finally, the output protein structure was selected from the ensembles using our in-house developed model quality assessment algorithm. On the test set of 164 multi-domain proteins, the results show that the average TM-score of M-DeepAssembly is 15.4% and 2.0% higher than AlphaFold2 and DeepAssembly, respectively. It is worth noting that there are models with higher accuracy in ensembles, achieving an improvement of 20.3% and 6.4% relative to the two baseline methods, although these models were not selected. Furthermore, when compared to the prediction results of AlphaFold2 for CASP15 multi-domain targets, M-DeepAssembly demonstrates certain performance advantages. CONCLUSIONS: M-DeepAssembly provides a distinctive multi-domain protein assembly algorithm, which can alleviate the current challenges of weak evolutionary signals and large structures to some extent by forming diverse ensembles using multi-objective protein conformation sampling algorithm. The proposed method contributes to exploring the functions of multi-domain proteins, especially providing new insights into targets with multiple conformational states.