ExUTR: a novel pipeline for large-scale prediction of 3'-UTR sequences from NGS data

ExUTR:一种用于从NGS数据中大规模预测3'-UTR序列的新型流程

阅读:1

Abstract

BACKGROUND: The three prime untranslated region (3'-UTR) is known to play a pivotal role in modulating gene expression by determining the fate of mRNA. Many crucial developmental events, such as mammalian spermatogenesis, tissue patterning, sex determination and neurogenesis, rely heavily on post-transcriptional regulation by the 3'-UTR. However, 3'-UTR biology seems to be a relatively untapped field, with only limited tools and 3'-UTR resources available. To elucidate the regulatory mechanisms of the 3'-UTR on gene expression, firstly the 3'-UTR sequences must be identified. Current 3'-UTR mining tools, such as GETUTR, 3USS and UTRscan, all depend on a well-annotated reference genome or curated 3'-UTR sequences, which hinders their application on a myriad of non-model organisms where the genomes are not available. To address these issues, the establishment of an NGS-based, automated pipeline is urgently needed for genome-wide 3'-UTR prediction in the absence of reference genomes. RESULTS: Here, we propose ExUTR, a novel NGS-based pipeline to predict and retrieve 3'-UTR sequences from RNA-Seq experiments, particularly designed for non-model species lacking well-annotated genomes. This pipeline integrates cutting-edge bioinformatics tools, databases (Uniprot and UTRdb) and novel in-house Perl scripts, implementing a fully automated workflow. By taking transcriptome assemblies as inputs, this pipeline identifies 3'-UTR signals based primarily on the intrinsic features of transcripts, and outputs predicted 3'-UTR candidates together with associated annotations. In addition, ExUTR only requires minimal computational resources, which facilitates its implementation on a standard desktop computer with reasonable runtime, making it affordable to use for most laboratories. We also demonstrate the functionality and extensibility of this pipeline using publically available RNA-Seq data from both model and non-model species, and further validate the accuracy of predicted 3'-UTR using both well-characterized 3'-UTR resources and 3P-Seq data. CONCLUSIONS: ExUTR is a practical and powerful workflow that enables rapid genome-wide 3'-UTR discovery from NGS data. The candidates predicted through this pipeline will further advance the study of miRNA target prediction, cis elements in 3'-UTR and the evolution and biology of 3'-UTRs. Being independent of a well-annotated reference genome will dramatically expand its application to much broader research area, encompassing all species for which RNA-Seq is available.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。