Abstract
The development of single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for elucidating cell heterogeneity and gene expression. Identifying and discovering cell types through cell clustering is a crucial step in analyzing scRNA-seq data. However, the high-dimensionality nature and frequent dropout events of the data raise great challenges for cell clustering. Here, we propose a novel contrastive clustering framework called scSCCNIA (Similarity-matrix-based Contrastive Clustering with Neighbor Information Aggregation), for the accurate identification of cell clusters from scRNA-seq data. scSCCNIA adopts a Laplacian filter to conduct neighbor information aggregation, constructs different graph views by using special un-shared parameters Siamese encoders for data augmentation, and learns the latent low-dimensional embedding representations via similarity-matrix-based contrastive learning. Comparative analyses of multiple scRNA-seq datasets from different platforms and with varying cell numbers demonstrate that scSCCNIA outperforms existing methods in terms of cell clustering and marker gene identification. Furthermore, scSCCNIA reveals the heterogeneity and functional specificity of various cell types through Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. Overall, scSCCNIA is an effective algorithm for learning latent features from scRNA-seq data, enhancing cell type identification accuracy and facilitating downstream analyses of scRNA-seq data.