AIPred: comprehensive prediction and analysis of non-histone acetylation via protein language model and interpretable machine learning

AIPred:基于蛋白质语言模型和可解释机器学习的非组蛋白乙酰化综合预测与分析

阅读:1

Abstract

BACKGROUND: Non-histone lysine acetylation is a widespread protein post-translational modification that regulates almost all key cellular processes, and its dysregulation is closely associated with various human diseases. Precise identification of non-histone acetylation sites is crucial for understanding their biological functions, but existing computational methods face challenges in prediction accuracy, model interpretability, and usability. RESULTS: Here, we presented AIPred (Acetylation Interpretable Prediction), an integrated framework that combines ESM Cambrian protein language model embeddings with diverse bioinformatics features through interpretable machine learning for prediction and analysis. Systematic evaluation demonstrated AIPred's superior performance, achieving improvements of 16.7%, 19.8%, and 20.8% over the state-of-the-art model in F1-score, Matthews correlation coefficient (MCC), and area under the precision-recall curve (AUPRC), respectively. Through Shapley additive explanations and gradient attribution analysis, we revealed key features and sequence patterns driving model decisions. Moreover, we developed a user-friendly online prediction server and a comprehensive prediction database. AIPred analysis of TDP-43 protein revealed functionally important acetylation sites, including novel predictions consistent with recent experimental findings. CONCLUSIONS: AIPred provides an accurate, interpretable, and accessible computational framework for predicting non-histone acetylation sites, which is expected to accelerate targeted research on non-histone acetylation-related mechanisms in cellular regulation and disease pathways.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。