A novel prediction method for protein-DNA binding sites based on protein language model fusion features with SE-connection pyramidal network and ensemble learning

一种基于蛋白质语言模型融合特征、SE连接金字塔网络和集成学习的蛋白质-DNA结合位点预测新方法

阅读:2

Abstract

Protein-DNA interactions are crucial in life processes such as gene expression and regulation. Therefore, the accurate prediction of DNA-binding sites on proteins is highly important for the advancement of scientific understanding in the field of biological activities. In this work, we propose a protein-DNA binding site prediction framework, termed Evolutionary Scale Modeling-SE-Connection Pyramidal (ESM-SECP), which integrates a sequence-feature-based prediction method with a sequence-homology-based predictor via ensemble learning. The sequence-feature-based prediction method is built on two types of input features: ESM-2 protein language model embeddings and evolutionary conservation information computed by PSI-BLAST. These features are fused by a multi-head attention mechanism and processed through the newly proposed SE-Connection Pyramidal(SECP) network for prediction. The sequence-template method, based on sequence homology, serves as a complementary approach to predict DNA-binding residues. The two predictors are combined via ensemble learning to improve overall model performance. Through the experimental validation of the TE46 and TE129 datasets, ESM-SECP outperforms the traditional methods in several evaluation indices, demonstrating its outstanding performance in Protein-DNA binding site prediction.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。