Widespread gene fusion artifacts in helminth genome annotations

蠕虫基因组注释中普遍存在的基因融合伪影

阅读:1

Abstract

BACKGROUND: Current helminth genomes possess thousands of predicted fusion genes, encoding novel protein domain architectures that are unique to these species. To investigate this, we analyzed 20,313 two-domain proteins annotated in current helminth genomes, of which 10,297 are apparently unique to helminths, and used RNA-seq data from 20 species of helminth to examine their plausibility as true fusion genes. For comparison, we analyzed a set of 400 high confidence, evolutionarily conserved domain fusions that are present in both helminth and non-helminth species. RESULTS: Our analysis suggests that, in contrast to genuine fusion genes, the majority of helminth-specific fusion genes in the 20 species investigated are likely gene prediction artifacts based on several criteria: (1) they show a lack of correlation between RNA-seq derived expression levels of the first and second “fused” domains, as well as the interdomain region; (2) they have significantly longer interdomain regions; (3) there is significantly less continuity of coverage in their interdomain regions consistent with breakpoints in RNA-seq coverage; and (4) they are generally not supported in de novo transcriptome assemblies. CONCLUSIONS: Proteins containing novel domain combinations have been included in widely used sequence and protein databases, including WormBase ParaSite and InterPro, but the analyses presented here suggest that many helminth-specific domain fusion proteins are erroneously annotated. These findings emphasize the importance of using RNA-seq data to validate gene predictions in helminth genomes, especially those with unique structures not observed in other species. Given the increasing need to accurately identify helminth-specific proteins as therapeutic targets, the accuracy of proteome annotation in widely used genomic databases is essential. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-026-12589-y.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。