A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory

利用外部添加和最大似然理论对RNA-seq计数进行校准的完整统计模型

阅读:2

Abstract

A fundamental assumption, common to the vast majority of high-throughput transcriptome analyses, is that the expression of most genes is unchanged among samples and that total cellular RNA remains constant. As the number of analyzed experimental systems increases however, different independent studies demonstrate that this assumption is often violated. We present a calibration method using RNA spike-ins that allows for the measurement of absolute cellular abundance of RNA molecules. We apply the method to pooled RNA from cell populations of known sizes. For each transcript, we compute a nominal abundance that can be converted to absolute by dividing by a scale factor determined in separate experiments: the yield coefficient of the transcript relative to that of a reference spike-in measured with the same protocol. The method is derived by maximum likelihood theory in the context of a complete statistical model for sequencing counts contributed by cellular RNA and spike-ins. The counts are based on a sample from a fixed number of cells to which a fixed population of spike-in molecules has been added. We illustrate and evaluate the method with applications to two global expression data sets, one from the model eukaryote Saccharomyces cerevisiae, proliferating at different growth rates, and differentiating cardiopharyngeal cell lineages in the chordate Ciona robusta. We tested the method in a technical replicate dilution study, and in a k-fold validation study.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。