Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

RNA-seq研究中平衡两组比较中差异基因表达分析的文库大小标准化和统计方法选择

阅读:1

Abstract

BACKGROUND: High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. Of primary concern, there is no consensus regarding which normalization and statistical methods are the most appropriate for analyzing this data. The lack of standardized analytical methods leads to uncertainties in data interpretation and study reproducibility, especially with studies reporting high false discovery rates. In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths. RESULTS: Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. However, using an intra-group analysis of false positives from real and simulated data, we found that a Wald test performs better than an exact test when the number of sample replicates is large and that a QL F-test performs the best given sample sizes of 5, 10 and 15 for any normalization. The RLE, TMM and UQ methods performed similarly given a desired sample size. CONCLUSION: We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. When the sample size is large, UQ-pgQ2 with a QL F-test is a better choice for the type I error control in an intra-group analysis. We observed read depths have a minimal impact for differential gene expression analysis based on the simulated data.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。