Abstract
Single-cell RNA sequencing (scRNA-seq) provides transcriptome profiling of individual cells, allowing for in-depth studies of cell heterogeneity at cell resolution. While cell clustering lays the basic foundation of scRNA-seq data analysis, the high-dimensionality and frequent dropout events of the data raise great challenges. Although plenty of dedicated clustering methods have been proposed, they often fail to fully explore the underlying data structure. Here, we introduce scMCGF, a new multi-view clustering algorithm based on graph fusion. It utilizes multi-view data generated from transcriptomic data to learn the consistent and complementary information across different view, ultimately constructing a unified graph matrix for robust cell clustering. Specifically, scMCGF utilizes two-dimensional-reduction methods (principal component analysis and diffusion maps) to capture both linear and non-linear characteristics of the data. Additionally, it calculates a cell-pathway score matrix to incorporate pathway-level information. These three features, along with the pre-processed gene expression data, form the multi-view data. scMCGF iteratively refines the structure of similarity graphs of each view through adaptive learning and learns a unified graph matrix by weighting and fusing the individual similarity graph matrix. The final clustering results are obtained by applying the rank constraint on the Laplacian matrix of the unified graph matrix. Experiments results of 13 real data sets reveal that scMCGF outperforms eight state-of-the-art methods in clustering accuracy and robustness. Furthermore, biological analysis validates that the clustering results of scMCGF provide a reliable foundation for downstream investigations.