Abstract
MOTIVATION: Single-cell heterogeneity analysis faces significant challenges due to the high dimensionality, complexity, and noise inherent in scRNA-seq data, especially when aiming for precise cell type classification. Existing analytical methods often exhibit limited generalization ability and adaptability across different biological contexts, leading to biased identification of cell subpopulations and hindering a comprehensive understanding of diseases, therapeutic responses, and biological processes. RESULTS: To address these issues, we propose a novel method named scKD, which integrates a hybrid neighbourhood-enhanced comparative learning model with a self-knowledge distillation strategy. scKD enhances clustering accuracy and is capable of accurately identifying both major cell types and rare cell subtypes. Extensive evaluations on multiple real-world datasets demonstrate that scKD achieves superior performance in subpopulation identification, clustering stability, and robustness. These results suggest that scKD is a powerful and reliable tool for analyzing single-cell transcriptomic data, facilitating deeper insights into cellular heterogeneity. AVAILABILITY: All datasets used in this study are publicly available. Detailed information about all the single-cell datasets analyzed in this paper is provided in Supplementary Table 1. All datasets can be accessed at https://zenodo.org/records/15412380. The source code is available at https://github.com/A-qlh/sckd. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.