Abstract
The advent of single-cell RNA sequencing (scRNA-seq) has transformed our ability to explore cellular heterogeneity and developmental processes at the single-cell level. Despite its transformative potential, challenges such as technical limitations, high costs, and sample scarcity can lead to insufficient scRNA-seq data, limiting its effectiveness in downstream analysis. In particular, there is often a lack of baseline data or an inadequate number of training samples for building robust computational models. To address these issues, we present GDSim, a novel deep generative network for the simulation of scRNA-seq data. GDSim leverages a label-guided diffusion-based model to capture the complex gene expression dependencies within scRNA-seq data, generating simulated datasets that closely reflect the true distribution of the original data. Experimental evaluations demonstrate that GDSim achieves superior performance in recovering data distribution characteristics compared with state-of-the-art methods. Moreover, GDSim maintains high consistency with real data in cell subtype clustering and differential gene expression analysis, offering a powerful tool for scRNA-seq simulation and downstream biological applications.