POSA-GO: Fusion of Hierarchical Gene Ontology and Protein Language Models for Protein Function Prediction

POSA-GO:融合分层基因本体论和蛋白质语言模型的蛋白质功能预测

阅读:2

Abstract

Protein function prediction plays a crucial role in uncovering the molecular mechanisms underlying life processes in the post-genomic era. However, with the widespread adoption of high-throughput sequencing technologies, the pace of protein function annotation significantly lags behind that of sequence discovery, highlighting the urgent need for more efficient and reliable predictive methods. To address the problem of existing methods ignoring the hierarchical structure of gene ontology terms and making it challenging to dynamically associate protein features with functional contexts, we propose a novel protein function prediction framework, termed Partial Order-Based Self-Attention for Gene Ontology (POSA-GO). This cross-modal collaborative modelling approach fuses GO terms with protein sequences. The model leverages the pre-trained language model ESM-2 to extract deep semantic features from protein sequences. Meanwhile, it transforms the partial order relationships among Gene Ontology (GO) terms into topological embeddings to capture their biological hierarchical dependencies. Furthermore, a multi-head self-attention mechanism is employed to dynamically model the association weights between proteins and GO terms, thereby enabling context-aware functional annotation. Comparative experiments on the CAFA3 and SwissProt datasets demonstrate that POSA-GO outperforms existing state-of-the-art methods in terms of Fmax and AUPR metrics, offering a promising solution for protein functional studies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。