FastRNA: An efficient solution for PCA of single-cell RNA-sequencing data based on a batch-accounting count model

FastRNA:一种基于批次计数模型的单细胞RNA测序数据主成分分析(PCA)高效解决方案

阅读:1

Abstract

Almost always, the analysis of single-cell RNA-sequencing (scRNA-seq) data begins with the generation of the low dimensional embedding of the data by principal-component analysis (PCA). Because scRNA-seq data are count data, log transformation is routinely applied to correct skewness prior to PCA, which is often argued to have added bias to data. Alternatively, studies have proposed methods that directly assume a count model and use approximately normally distributed count residuals for PCA. Despite their theoretical advantage of directly modeling count data, these methods are extremely slow for large datasets. In fact, when the data size grows, even the standard log normalization becomes inefficient. Here, we present FastRNA, a highly efficient solution for PCA of scRNA-seq data based on a count model accounting for both batches and cell size factors. Although we assume the same general count model as previous methods, our method uses two orders of magnitude less time and memory than the other count-based methods and an order of magnitude less time and memory than the standard log normalization. This achievement results from our unique algebraic optimization that completely avoids the formation of the large dense residual matrix in memory. In addition, our method enjoys a benefit that the batch effects are eliminated from data prior to PCA. Generating a batch-accounted PC of an atlas-scale dataset with 2 million cells takes less than a minute and 1 GB memory with our method.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。