Abstract
Most bitopic transmembrane proteins associate with one another through interface residues to form dimers, which facilitate or activate specific cellular functions. Therefore, accurately identifying interface residues in a given dimer is crucial for understanding its function and has been a challenging pursuit for many computational methods. These methods can be broadly categorized into two approaches: general-purpose ones for dimerization and specialized ones for interface residues. In this study, we develop a machine learning method that integrates both approaches by integrating sequential and structural features extracted from predicted structures and various domains. The results from cross-validation on a benchmark dataset show that our method, despite utilizing significantly fewer features, outperforms the state-of-the-art methods by more than three percentage points in performance, as measured by the F1 score. Furthermore, we evaluated the performance of the proposed model on a benchmark dataset as compared to the state-of-the-art multimeric structure predictors, including RoseTTAFold2, AlphaFold2Multimer, and PREDDIMER. The results show the superiority of the proposed model by outperforming all the other models, highlighting the effectiveness of integrating both structural and sequential features within the proposed framework.