TAGET: a toolkit for analyzing full-length transcripts from single molecular sequencing

TAGET:用于分析单分子测序全长转录本的工具包

阅读:2

Abstract

Single-molecule Real-time Isoform Sequencing (Iso-seq) of transcriptomes by PacBio can generate very long and accurate reads, thus providing an ideal platform for full-length transcriptome analysis.A number of computational tools have been developed for long-read sequencing data. However, integrated computational frameworks for analyzing Iso-seq data are still lacking. We present a Toolkit for Analyzing full-length GEne Transcripts (TAGET) for Iso-seq. Starting from polished high-quality transcripts (circular consensus sequences or CCSs), TAGET first aligns transcripts to the reference genome by integrating alignment results from long and short reads and further improves splice site predictions using a Convolutional Neural Network (CNN). TAGET then annotates transcripts by comparing with reference isoform databases and classifies transcripts into seven classes. Finally, TAGET estimates gene or isoform expressions and performs differential expression gene (DEG) and differential isoform usage (DIU) analysis. We evaluate the performance of TAGET using a public Iso-seq dataset and newly sequenced Iso-seq datasets from tumor patients. TAGET gives significantly more precise novel splice site prediction and enables more accurate novel isoform and gene fusion discoveries, as validated by experimental validations and comparisons with RNA-seq data. We identify and experimentally validate a differential isoform usage gene ECM1, and further show that its isoform ECM1b may be a tumor-suppressor in laryngocarcinoma. Our results demonstrate that TAGET provides a valuable computational toolkit and can be applied to many full-length transcriptome studies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。