PlasGO: enhancing GO-based function prediction for plasmid-encoded proteins based on genetic structure

PlasGO:基于遗传结构增强质粒编码蛋白的GO功能预测

阅读:1

Abstract

BACKGROUND: Plasmid, as a mobile genetic element, plays a pivotal role in facilitating the transfer of traits, such as antimicrobial resistance, among the bacterial community. Annotating plasmid-encoded proteins with the widely used Gene Ontology (GO) vocabulary is a fundamental step in various tasks, including plasmid mobility classification. However, GO prediction for plasmid-encoded proteins faces 2 major challenges: the high diversity of functions and the limited availability of high-quality GO annotations. RESULTS: In this study, we introduce PlasGO, a tool that leverages a hierarchical architecture to predict GO terms for plasmid proteins. PlasGO utilizes a powerful protein language model to learn the local context within protein sentences and a BERT model to capture the global context within plasmid sentences. Additionally, PlasGO allows users to control the precision by incorporating a self-attention confidence weighting mechanism. We rigorously evaluated PlasGO and benchmarked it against 7 state-of-the-art tools in a series of experiments. The experimental results collectively demonstrate that PlasGO has achieved commendable performance. PlasGO significantly expanded the annotations of the plasmid-encoded protein database by assigning high-confidence GO terms to over 95% of previously unannotated proteins, showcasing impressive precision of 0.8229, 0.7941, and 0.8870 for the 3 GO categories, respectively, as measured on the novel protein test set. CONCLUSIONS: PlasGO, a hierarchical tool incorporating protein language models and BERT, significantly expanded plasmid protein annotations by predicting high-confidence GO terms. These annotations have been compiled into a database, which will serve as a valuable contribution to downstream plasmid analysis and research.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。