Abstract
Protein-DNA interactions are crucial in life processes such as gene expression and regulation. Therefore, the accurate prediction of DNA-binding sites on proteins is highly important for the advancement of scientific understanding in the field of biological activities. In this work, we propose a protein-DNA binding site prediction framework, termed Evolutionary Scale Modeling-SE-Connection Pyramidal (ESM-SECP), which integrates a sequence-feature-based prediction method with a sequence-homology-based predictor via ensemble learning. The sequence-feature-based prediction method is built on two types of input features: ESM-2 protein language model embeddings and evolutionary conservation information computed by PSI-BLAST. These features are fused by a multi-head attention mechanism and processed through the newly proposed SE-Connection Pyramidal(SECP) network for prediction. The sequence-template method, based on sequence homology, serves as a complementary approach to predict DNA-binding residues. The two predictors are combined via ensemble learning to improve overall model performance. Through the experimental validation of the TE46 and TE129 datasets, ESM-SECP outperforms the traditional methods in several evaluation indices, demonstrating its outstanding performance in Protein-DNA binding site prediction.