Revisiting the functional annotation of TriTryp using sequence similarity tools

利用序列相似性工具重新审视TriTryp的功能注释

阅读:1

Abstract

Trypanosomatids are the causative agents of deadly diseases in humans and livestock. Given the high phylogenetic distance of trypanosomatids from model organisms, these organisms have ample unannotated genes. Manual functional annotation is time-consuming, highlighting the importance of automated functional annotation tools. The development of automated functional tools is a hot research topic, and multiple tools have been developed for the task. PANNZER2 is an automated functional annotation tool that merely relies on the sequence similarity of the query to the annotated proteins. We tried PANNZER2 on Trypanosoma brucei, the most studied organism among trypanosomatids, to see if it could improve our knowledge of the functions of the genes. Even with the availability of automated annotation tools like InterPro2GO in databases such as TriTrypDB, PANNZER2 has made surprisingly confident predictions for some hypothetical proteins in T. brucei. In this study, we identify gaps in such annotations because of not employing pairwise sequence alignment tools in TriTrypDB's automated annotation process. Our findings demonstrate that even the use of stringent cutoffs can successfully annotate a significant number of proteins. Additionally, we discovered that adjusting the open reading frames in certain genes leads to sequences with increased sequence signature coverage-characterized by the length covered by at least one sequence signature-compared to the original sequences. This enhanced sequence signature coverage suggests these genomic fragments could be pseudogenes. To facilitate further exploration, we developed a script to help identify potential pseudogenes within an organism's genome, offering researchers a new tool for genomic analysis and understanding. We extended all our analysis to Trypanosoma cruzi and Leishmania major to assess the impact of this approach across different species. Our study demonstrates that by utilizing pairwise sequence similarity alignment, even with stringent cutoffs, we can attribute 2986, 3953, and 3798 new GO terms to the genomes of T. brucei, T. cruzi, and L. major. Additionally, we found that 210, 239, and 29 genes exhibit increased sequence signature coverage following frame correction, suggesting the presence of pseudogenes.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。