Abstract
Single-cell RNA sequencing (scRNA-seq) has revolutionized transcriptomic analysis, enabling high-resolution profiling of gene expression across individual cells. However, technical limitations, particularly dropout events, still hinder the accurate quantification of gene expression heterogeneity, resulting in an underestimation of gene-expressing cell proportions. Numerous computational methods have been developed to impute missing expression signals. In this study, we systematically evaluated eight prominent imputation algorithms-MAGIC, SAVER, scVI, DCA, scBiG, kNN-smoothing, scImpute, and ALRA-using two simulated datasets and a deeply sequenced reference dataset. We defined conditions that make it possible to improve estimates of cell population sizes and demonstrate the utility of imputation in biologically relevant contexts, including identifying cell populations susceptible to SARS-CoV-2 infection and detecting CFTR-expressing cells. Our findings highlight the potential and limitations of current imputation strategies and offer practical recommendations for improving the interpretability and accuracy of scRNA-seq data.