scSemiPLC: a semi-supervised learning framework for annotating single-cell RNA-Seq data by generating pseudo-labels through clustering

scSemiPLC:一种通过聚类生成伪标签来注释单细胞RNA-Seq数据的半监督学习框架

阅读:3

Abstract

Single-cell RNA sequencing (scRNA-seq) technology enables researchers to explore heterogeneity of diverse cell types within complex tissues at the single-cell resolution. Cell annotation, as a crucial step in scRNA-seq data analysis, provides biologically meaningful cell identity information for biological research. With the proliferation of publicly available datasets and the expansion of sequencing data scale, traditional annotation methods reliant on manual marker gene matching have become increasingly cumbersome and time-consuming. Consequently, efficient and convenient automated cell annotation methods are gradually becoming mainstream. In this paper, we propose a single-cell semi-supervised annotation training framework called scSemiPLC, which generates pseudo-labels through clustering and consistency regularization. Specifically, scSemiPLC utilizes existing label information to guide the clustering of unlabeled data. During model training, it assigns pseudo-labels to the unlabeled samples and constrains the prediction of perturbed data to be similar to the pseudo-labels. This strategy addresses the low utilization of unlabeled data caused by the fixed high threshold pseudo-labeling paradigm, offering a new approach for cell annotation in the semi-supervised learning field. Experimental results demonstrate the superior performance of scSemiPLC in annotation accuracy and stability, extraction of biologically meaningful representations, and robustness to the number of cell labels, significantly outperforming classical automatic annotation and mainstream semi-supervised learning methods. IMPORTANCE: This work proposes a novel cell annotation training framework, scSemiPLC, which significantly enhances the efficiency and accuracy of annotation by fully leveraging unlabeled data. In the semi-supervised learning component, the framework innovatively generates pseudo-labels through clustering. Subsequently, it evaluates the reliability of these pseudo-labels and assigns corresponding weights, thereby balancing both their quantity and quality. This approach provides new insights into the direction of automatic cell annotation within the realm of semi-supervised learning.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。