Abstract
Single-cell RNA sequencing (scRNA-seq) has become an indispensable tool for studying biological systems at the cellular level. It has therefore has become increasingly important to develop accurate statistical models of scRNA-seq data. While many models have been proposed to characterize transcript expression of individual genes, comparatively little attention has been paid to modeling gene coexpression. Copula modeling offers a flexible approach to modeling gene coexpression by linking models of individual genes together using copula functions. Despite the growing popularity of copula models, their utility for modeling scRNA-seq data has not been thoroughly explored. Here we evaluated six copula models on their ability to model gene coexpression in scRNA-seq data. Using a diverse collection of reference datasets, we evaluated each copula model's accuracy and efficiency in reproducing gene coexpression patterns. Our results show that Gaussian copulas provide the best balance between accuracy and speed, with more flexible but expensive copula models providing only a marginal improvement in accuracy while requiring a much longer time to fit. Vine copulas show promise in being able to achieve high accuracy, but current implementations are unable to scale to the large size of typical scRNA-seq datasets.