scEVOLVE: cell-type incremental annotation without forgetting for single-cell RNA-seq data

scEVOLVE:用于单细胞RNA测序数据的细胞类型增量注释,避免遗忘

阅读:1

Abstract

The evolution in single-cell RNA sequencing (scRNA-seq) technology has opened a new avenue for researchers to inspect cellular heterogeneity with single-cell precision. One crucial aspect of this technology is cell-type annotation, which is fundamental for any subsequent analysis in single-cell data mining. Recently, the scientific community has seen a surge in the development of automatic annotation methods aimed at this task. However, these methods generally operate at a steady-state total cell-type capacity, significantly restricting the cell annotation systems'capacity for continuous knowledge acquisition. Furthermore, creating a unified scRNA-seq annotation system remains challenged by the need to progressively expand its understanding of ever-increasing cell-type concepts derived from a continuous data stream. In response to these challenges, this paper presents a novel and challenging setting for annotation, namely cell-type incremental annotation. This concept is designed to perpetually enhance cell-type knowledge, gleaned from continuously incoming data. This task encounters difficulty with data stream samples that can only be observed once, leading to catastrophic forgetting. To address this problem, we introduce our breakthrough methodology termed scEVOLVE, an incremental annotation method. This innovative approach is built upon the methodology of contrastive sample replay combined with the fundamental principle of partition confidence maximization. Specifically, we initially retain and replay sections of the old data in each subsequent training phase, then establish a unique prototypical learning objective to mitigate the cell-type imbalance problem, as an alternative to using cross-entropy. To effectively emulate a model that trains concurrently with complete data, we introduce a cell-type decorrelation strategy that efficiently scatters feature representations of each cell type uniformly. We constructed the scEVOLVE framework with simplicity and ease of integration into most deep softmax-based single-cell annotation methods. Thorough experiments conducted on a range of meticulously constructed benchmarks consistently prove that our methodology can incrementally learn numerous cell types over an extended period, outperforming other strategies that fail quickly. As far as our knowledge extends, this is the first attempt to propose and formulate an end-to-end algorithm framework to address this new, practical task. Additionally, scEVOLVE, coded in Python using the Pytorch machine-learning library, is freely accessible at https://github.com/aimeeyaoyao/scEVOLVE.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。