MegSite: an accurate nucleic acid-binding residue prediction method based on multimodal protein language model

MegSite:一种基于多模态蛋白质语言模型的精确核酸结合残基预测方法

阅读:1

Abstract

Accurate identification of nucleic acid-binding residues is crucial for understanding protein-nucleic acid interactions, which play a key role in gene expression research and the discovery of regulatory mechanisms. Despite numerous computational efforts to address this challenge, achieving high accuracy remains difficult due to the complexity of extracting meaningful insights from proteins. Here, we introduce MegSite, a novel multimodal protein language model-informed method that integrates discriminative knowledge from protein sequence, structure, and function. This work presents the first integration of ESM3 multimodal features for nucleic acid-binding site prediction. MegSite significantly outperforms existing prediction methods, as evidenced by its performance on multiple independent test sets. The Matthews correlation coefficient values achieved by MegSite on DNA-129_Test, DNA-181_Test, RNA-117_Test, and RNA-285_Test are 0.567, 0.444, 0.411, and 0.421, representing the improvements of 2.72%, 7.66%, 1.22% and 6.58% over the second-best method separately. Notably, MegSite demonstrates robust performance even on proteins with low structural similarity, surpassing the previous structure-based methods. Furthermore, this method is seamlessly extendable to the predicted protein structure and a newly released RNA-binding residue test set with high accuracy, highlighting its broad applicability. Comprehensive experimental results reveal that the superior performance of MegSite is attributed to its effective integration of multimodal protein knowledge.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。