Abstract
Single-cell RNA sequencing (scRNA-seq) allows transcriptomic profiling at single-cell resolution, providing valuable insights into cellular diversity across tissues, developmental stages, and diseases. However, accurately identifying cell types remains challenging due to the high dimensionality, sparsity, and noise inherent in scRNA-seq data. To address these challenges in cell type identification in scRNA-seq data, we introduce scAURA (single cell Alignment- and Uniformity-based Graph Debiased Contrastive Representation Architecture), a unified framework that integrates graph debiased contrastive learning with self-supervised clustering. We evaluated scAURA on 18 real single-cell datasets collected from six sequencing platforms spanning diverse tissue and cell types in human and mouse. scAURA outperformed all state-of-the-art (SOTA) methods in nine and eight datasets in Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), respectively. On average, scAURA obtained average ranks of 2.28 (ARI) and 2.39 (NMI) across all 13 SOTA methods, demonstrating its consistent superiority across datasets. scAURA also exhibited strong robustness to dropout noise by maintaining stable clustering performance even under increasing sparsity levels. Furthermore, in an external single-cell Alzheimer's disease dataset, scAURA accurately clustered different cell types, identified novel cell type-specific marker genes, and inferred their potential transcriptional regulators. The source code and datasets are available at https://github.com/bozdaglab/scAURA. CONTACT: Serdar.Bozdag@unt.edu.