Abstract
Protein carbonylation refers to the covalent modification of proteins through the attachment of carbonyl groups, which arise from oxidative stress. This modification is biologically significant, as it can elicit modifications in protein functionality, signaling cascades, and cellular homeostasis. Accurate prediction of carbonylation sites offers valuable insights into the mechanisms underlying protein carbonylation and the pathogenesis of related diseases. Notably, carbonylation sites and ligand interaction sites, both functional sites, exhibit numerous similarities. The survey reveals that current computation-based approaches tend to make excessive cross-predictions for ligand interaction sites. To tackle this unresolved challenge, selective carbonylation sites (SCANS) is introduced, a novel deep learning-based framework. SCANS employs a multilevel attention strategy to capture both local (segment-level) and global (protein-level) features, utilizes a tailored loss function to penalize cross-predictions (residue-level), and applies transfer learning to augment the specificity of the overall network by leveraging knowledge from pretrained model. These innovative designs have been shown to successfully boost predictive performance and statistically outperforms current methods. Particularly, results on benchmark testing dataset demonstrate that SCANS consistently achieves low false positive rates, including low rates of cross-predictions. Furthermore, motif analyses and interpretations are conducted to provide novel insights into the protein carbonylation sites from various perspectives.