Predicting protein-carbohydrate binding sites: a deep learning approach integrating protein language model embeddings and structural features

预测蛋白质-碳水化合物结合位点:一种整合蛋白质语言模型嵌入和结构特征的深度学习方法

阅读:1

Abstract

Protein-carbohydrate interactions play an important role in many biological processes and functions, like inflammation, signal transduction, and cell adhesion. In our work, we will study non-covalent carbohydrate binding sites. In this paper, we aim to build a deep-learning model to predict non-covalent protein-carbohydrate binding sites. We were motivated by the fact that experimental approaches for predicting these sites are expensive. So, computational tools are necessary for identifying these interactions. We explored several sequence-based features as well as structural features. We also leveraged protein language model embeddings. We analyzed different architectures and selected the most suitable deep learning architecture for our finalized prediction model, DeepCPBSite. DeepCPBSite is an ensemble model that combines three separate models with three approaches (random undersampling, weighted oversampling, and class-weighted loss) built on the ResNet+FNN architecture. We made separate datasets from three sources: RCSB, UniProt, and CASP. We also compared the structural features extracted from the structures predicted by AlphaFold and ESMFold in the context of our prediction tasks. We employed three different feature selection techniques and finally did a SHAP (SHapley Additive exPlanations) analysis on the structural features after categorizing the proteins based on their organism information. DeepCPBSite achieved 78.7% balanced accuracy and 59.6% sensitivity on the TS53 set, outperforming the second-best competitor, DeepGlycanSite, by 1.16% and 2.94%, respectively. Additionally, its F1, MCC, and AUPR scores outperformed other state-of-the-art methods, with improvements ranging from 3.77%-47.6%, 3.84%-32.7%, and 8.18%-60.21%, respectively.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。