Abstract
BACKGROUND: Non-histone lysine acetylation is a widespread protein post-translational modification that regulates almost all key cellular processes, and its dysregulation is closely associated with various human diseases. Precise identification of non-histone acetylation sites is crucial for understanding their biological functions, but existing computational methods face challenges in prediction accuracy, model interpretability, and usability. RESULTS: Here, we presented AIPred (Acetylation Interpretable Prediction), an integrated framework that combines ESM Cambrian protein language model embeddings with diverse bioinformatics features through interpretable machine learning for prediction and analysis. Systematic evaluation demonstrated AIPred's superior performance, achieving improvements of 16.7%, 19.8%, and 20.8% over the state-of-the-art model in F1-score, Matthews correlation coefficient (MCC), and area under the precision-recall curve (AUPRC), respectively. Through Shapley additive explanations and gradient attribution analysis, we revealed key features and sequence patterns driving model decisions. Moreover, we developed a user-friendly online prediction server and a comprehensive prediction database. AIPred analysis of TDP-43 protein revealed functionally important acetylation sites, including novel predictions consistent with recent experimental findings. CONCLUSIONS: AIPred provides an accurate, interpretable, and accessible computational framework for predicting non-histone acetylation sites, which is expected to accelerate targeted research on non-histone acetylation-related mechanisms in cellular regulation and disease pathways.