Abstract
Precisely predicting the binding residues of protein-small molecule ligands is crucial for advancing protein functional annotation and aiding in the design of molecular drugs. However, the theoretical calculation methods based on sequence information are difficult to improve in prediction accuracy, making the prediction of small molecule ligands binding residues challenging. Here, we developed an method to precisely identify the binding residues of four small molecule ligands of ATP, ADP, GDP and NAD. The method introduced four correlation features based on sequence information: neighbor correlation, residue pairs, central motifs and PSSM correlation features. The study consistently achieved favorable results in testing on both our self-built dataset and those utilized by previous researchers. On the independent testing of our dataset, the highest values of sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthews correlation coefficient (MCC) were observed for GDP (54.93%), ATP (98.68%), NAD (53.41%), and NAD (0.5341), respectively. Ablative experiments confirmed that these correlation features significantly improve the prediction results, indicating that the incorporation of correlation features plays a certain role in the recognition of protein-small molecule binding residues. This study highlights that the selected feature parameters and algorithms are instrumental in developing a robust prediction model. The source code for prediction and some results can be found at https://github.com/fendouba123/ATP-Program .