Gated-GPS: enhancing protein-protein interaction site prediction with scalable learning and imbalance-aware optimization

门控GPS:利用可扩展学习和不平衡感知优化增强蛋白质-蛋白质相互作用位点预测

阅读:1

Abstract

In protein-protein interaction site (PPIS) prediction, existing machine learning models struggle with small datasets, limiting their predictive accuracy for unseen proteins. Additionally, class imbalance in protein complexes, where binding residues constitute a small fraction of all residues, hinders model performance. To address these challenges, we constructed a training dataset 9$\times $ larger than previous benchmarks by filtering the latest protein-protein complex data, improving diversity and generalization. We propose Gated-GPS, a Graph Transformer model with a novel gating mechanism designed to effectively leverage this expanded dataset. Additionally, we integrate cross-entropy loss with Tversky Loss to adjust sensitivity to positive and negative samples, mitigating class imbalance by emphasizing underrepresented binding residues. Experimental results show that Gated-GPS outperforms state-of-the-art (SOTA) models across four test sets. Notably, on the UBTest dataset, designed to evaluate generalization on unbounded proteins, our method improves MCC and AUPRC by 18.5% and 21.4%, respectively, over the previous SOTA. In a case study of snake venom toxin-protein interactions, our model accurately identified interaction sites, demonstrating its potential for therapeutic design and advancing the understanding of complex protein interactions.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。