Abstract
ATP, a high-energy phosphate compound also known as adenosine triphosphate, serves as a direct energy source for living organisms. Proteins, composed of amino acids, are fundamental macromolecules and essential building blocks of life. The interaction between proteins and ATP is crucial for various biological processes, including movement, regulation, and metabolism. Predicting the interaction between proteins and ATP is of paramount importance, particularly in modelling their binding sites and conducting downstream studies; therefore, advancements in techniques hold significant value for disease prevention, diagnosis, treatment, and drug design. However, current research methods face numerous challenges, such as the need for various algorithms to extract multilevel features and then integrate them into one deep learning model, which is inflexible and may result in the loss of important information implied in sequences. In this study, we propose a novel Large Language Model (LLM)-based model, the pretrained fractional-order deep convolution neural network (PFDCNN), to predict protein-ATP binding sites through sequence information that is extracted from protein sequence features by a pretrained protein large language model; then, we employ a deep convolutional neural network with fractional-order backpropagation for prediction and modify the loss function to control the impact of data imbalance. We trained and tested our model on several protein-ATP binding site datasets, and the comparison results revealed that the PFDCNN exhibited excellent generalization ability, with accuracies of 0.99 and 0.984 and AUC values of 0.965 and 0.941, respectively, on two famous protein-ATP datasets, surpassing those of most existing protein binding site prediction models.