Abstract
Protein succinylation is a vital post-translational modification that regulates diverse cellular processes. Accurate identification of succinylation sites is crucial for understanding protein function and development of targeted drugs. In this study, we propose an intelligent computational model, iSucc-SnCNs, which encodes protein sequences using the ProtGPT2-based protein language model. Structural representations are derived from SMR and PSSM matrices to extract SMR-HOG, SMR-DCT, and PSSM-DWT features. The BTGA+KNN algorithm selects top-ranked features from the hybrid feature vector. Finally, a self-normalized capsule neural network (Sn-CapsNet) is trained using a BTGA-based optimal feature set. The proposed iSucc-SnCNs achieved an accuracy of 92.92% and an AUC of 0.96, outperforming traditional models by 17%. The generalization of the iSucc-SnCNs model on two independent datasets (Ind-I and Ind-II) demonstrated improved performance by approximately 13% and 2%, respectively. These results highlight iSucc-SnCNs as a robust and efficient framework for large-scale succinylation site prediction and protein function analyses in drug discovery.