Abstract
Small ubiquitin-like modifiers (SUMOs) are crucial protein regulators influencing diverse biological processes through covalent modifications or non-covalent interactions. SUMOylation, a key post-translational modification (PTM), plays a vital role in cellular regulation. This study presents Hybrid-Sumo, a deep learning-based model integrating protein structural and sequence features to predict SUMOylation sites. Hybrid-Sumo combines three advanced feature extraction techniques: Half-Sphere Exposure (HSE), Position-Specific Scoring Matrix with Discrete Wavelet Transform (PSSM-DWT), and Bidirectional Encoder Representations from Transformers (BERT). The SHapley Additive exPlanations (SHAP) algorithm is employed for optimal feature selection, while a Deep Neural Network (DNN) serves as the classification model. Extensive 10-fold cross-validation confirms the effectiveness of Hybrid-Sumo, achieving 99.74% accuracy on benchmark datasets and 96.15% and 95.83% on balanced and imbalanced independent datasets, respectively. These results surpass existing models, improving training accuracy by 1.45% and testing accuracy (both balanced and imbalanced) by 1.90% and 0.25%, respectively. These findings highlight Hybrid-Sumo as a robust computational tool for accurate prediction of SUMOylation sites, accelerating research on protein function and modification analysis.