Denoising single-cell RNA-seq data with a deep learning-embedded statistical framework

利用深度学习嵌入式统计框架对单细胞RNA测序数据进行去噪

阅读:3

Abstract

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) provides extensive opportunities to explore cellular heterogeneity but is often limited by substantial technical noise and variability. The prevalence of zero counts, arising from both biological variation and technical dropout events, poses significant challenges for downstream analyses. Existing imputation methods face inherent trade-offs: statistical approaches maintain interpretability but exhibit limited capacity for capturing complex, non-linear gene expression relationships, whereas deep learning methods demonstrate superior flexibility but are prone to overfitting and lack mechanistic interpretability, particularly in settings with limited sample sizes. METHODS: We present ZILLNB (Zero-Inflated Latent factors Learning-based Negative Binomial), a novel computational framework that integrates zero-inflated negative binomial (ZINB) regression with deep generative modeling. ZILLNB employs an ensemble architecture combining Information Variational Autoencoder (InfoVAE) and Generative Adversarial Network (GAN) to learn latent representations at cellular and gene levels. These latent factors serve as dynamic covariates within a ZINB regression framework, with parameters iteratively optimized through an Expectation-Maximization algorithm. This approach enables systematic decomposition of technical variability from intrinsic biological heterogeneity. RESULTS: Comparative evaluations across multiple scRNA-seq datasets demonstrate ZILLNB's superior performance. In cell type classification tasks using mouse cortex and human PBMC datasets, ZILLNB achieved the highest Adjusted Rand index (ARI) and Adjusted Mutual Information (AMI) among tested methods, with improvements ranging from 0.05 to 0.2 over VIPER, scImpute, DCA, DeepImpute, SAVER, scMultiGAN and ALRA. For differential expression analysis validated against matched bulk RNA-seq data, ZILLNB demonstrated improvements ranging from 0.05 to 0.3 for area under the Receiver Operating Characteristic curve (AUC-ROC) and the Precision-Recall curve (AUC-PR) compared to standard and other imputation methods, with consistently lower false discovery rates. Application to idiopathic pulmonary fibrosis (IPF) datasets revealed distinct fibroblast subpopulations undergoing fibroblast-to-myofibroblast transition, validated through marker gene expression and pathway enrichment analyses. CONCLUSION: ZILLNB provides a principled framework for addressing technical artifacts in scRNA-seq data while preserving biological variation. The integration of statistical modeling with deep learning enables robust performance across diverse analytical tasks, including cell type identification, differential expression analysis, and rare cell population discovery, demonstrating utility across common single-cell analysis tasks.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。