A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis

利用新型Iso-seq分析方法,对拟南芥转录组进行高分辨率单分子测序分析。

阅读:3
作者:Runxuan Zhang,Richard Kuo,Max Coulter,Cristiane P G Calixto,Juan Carlos Entizne,Wenbin Guo,Yamile Marquez,Linda Milne,Stefan Riegler,Akihiro Matsui,Maho Tanaka,Sarah Harvey,Yubang Gao,Theresa Wießner-Kroh,Alejandro Paniagua,Martin Crespi,Katherine Denby,Asa Ben Hur,Enamul Huq,Michael Jantsch,Artur Jarmolowski,Tino Koester,Sascha Laubinger,Qingshun Quinn Li,Lianfeng Gu,Motoaki Seki,Dorothee Staiger,Ramanjulu Sunkar,Zofia Szweykowska-Kulinska,Shih-Long Tu,Andreas Wachter,Robbie Waugh,Liming Xiong,Xiao-Ning Zhang,Ana Conesa,Anireddy S N Reddy,Andrea Barta,Maria Kalyna,John W S Brown

Abstract

Background: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results: We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts-twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions: AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。