Abstract
BACKGROUND: Protein post-translational modification (PTM) is a critical biological process that occurs after protein synthesis and has key roles in several biological processes. Among these, lysine modifications include multiple types and have received considerable attention. Most existing computational models predict whether a specific lysine site in a protein sequence corresponds to a lysine modification type by extracting features from a short peptide segment centered on that site. Therefore, information from the full protein sequence is not used. RESULTS: In this study, we gave a different direction for investigating lysine modifications. A computational model, PLysPTM-HGNN, was designed to identify lysine modification types at the protein level. Full protein sequence information was used to derive three feature types: gene ontology features, large language model features, and position-specific scoring matrix features. These features were refined separately through a linear transformation, a hybrid graph neural network, and a convolutional neural network combiner, after which they were concatenated and passed into a fully connected layer for prediction. Cross-validation results showed that the AUROC and AUPR were approximately 0.84 and 0.68, respectively, indicating strong predictive performance. CONCLUSIONS: PLysPTM-HGNN outperformed several existing protein subcellular localization models and models based on traditional multi-label classification algorithms. This model provides a useful tool for studies of lysine modifications.