Abstract
BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is considered a revolution in gene expression studies and offers significant benefits across various fields of biomedical or clinical applications. However, a high proportion of technical dropouts in scRNA-seq data leads to increased noise and reduces the performance of downstream analyses such as cell clustering, differential expression analysis, and cell trajectory inference. Numerous recent imputation methods utilize deep learning to recover missing gene expression values in scRNA-seq data. Despite the research efforts, existing methods have limitations in capturing local co-expression patterns in scRNA-seq data and handling the uncertainty in distinguishing technical zeros from true biological zeros. RESULTS: This work proposes a novel imputation method for scRNA-seq data, called scZiva, based on Variational Autoencoder (VAE). It introduces a structured probabilistic framework that jointly models dropout uncertainty and statistically induced local gene dependencies. scZiva also adopts a probability-guided selective imputation mechanism to recover likely technical dropouts while preserving biologically meaningful zeros. The framework is implemented using a Zero-Inflated Negative Binomial (ZINB) likelihood with a convolution-enhanced encoder architecture. Comprehensive experiments conducted on both simulated and real datasets demonstrate the strength of scZiva compared with other baseline methods. CONCLUSION: The proposed method demonstrates strong and stable performance in most evaluation settings compared to five baseline methods, particularly in recovering missing gene expression values and supporting downstream analyses. It is a promising approach for analyzing scRNA-seq data.