RVboost: RNA-seq variants prioritization using a boosting method

RVboost:一种利用增强方法对RNA-seq变异进行优先级排序的方法

阅读:1

Abstract

MOTIVATION: RNA-seq has become the method of choice to quantify genes and exons, discover novel transcripts and detect fusion genes. However, reliable variant identification from RNA-seq data remains challenging because of the complexities of the transcriptome, the challenges of accurately mapping exon boundary spanning reads and the bias introduced during the sequencing library preparation. METHOD: We developed RVboost, a novel method specific for RNA variant prioritization. RVboost uses several attributes unique in the process of RNA library preparation, sequencing and RNA-seq data analyses. It uses a boosting method to train a model of 'good quality' variants using common variants from HapMap, and prioritizes and calls the RNA variants based on the trained model. We packaged RVboost in a comprehensive workflow, which integrates tools of variant calling, annotation and filtering. RESULTS: RVboost consistently outperforms the variant quality score recalibration from the Genome Analysis Tool Kit and the RNA-seq variant-calling pipeline SNPiR in 12 RNA-seq samples using ground-truth variants from paired exome sequencing data. Several RNA-seq-specific attributes were identified as critical to differentiate true and false variants, including the distance of the variant positions to exon boundaries, and the percent of the reads supporting the variant in the first six base pairs. The latter identifies false variants introduced by the random hexamer priming during the library construction. AVAILABILITY AND IMPLEMENTATION: The RVboost package is implemented to readily run in Mac or Linux environments. The software and user manual are available at http://bioinformaticstools.mayo.edu/research/rvboost/.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。