CanID: a robust and accurate RNAseq Expression-based diagnostic classification scheme for pediatric malignancies

CanID:一种稳健且准确的基于RNA测序表达的儿童恶性肿瘤诊断分类方案

阅读:1

Abstract

Cancer subtype classification is critical for precision therapy and there is a growing trend of augmenting histopathology testing procedures with omics-based machine learning classifiers. However, analytical challenges remain for pediatric cancer on the scope and precision of the current classifiers as well as the evolving subtype standardization. To address these challenges, we built Cancer Identification or CanID, a stacked ensemble machine learning classification scheme, using the transcriptomic features derived from gene-level RNA sequencing count data as the sole input. CanID was developed primarily from 3203 pediatric cancer samples of 13 solid tumor subtypes and 38 hematologic malignancy subtypes with subtype labels curated without the use of RNA-seq data. The accuracies of independent testing in three independent or external data sets for Solid Tumor and Hematologic Malignancy are 99% and 92-93%, respectively. Notably, CanID was able to classify subtypes challenging for clinical histology evaluation and was robust to both biological and technical challenges, including differences in data collection protocols, class imbalance, potential mislabeled training samples and classes unobserved in training. The high accuracy, robustness, biological interpretability of this transcriptome-based classification scheme represents a valuable approach to advance tumor diagnosis and clinically meaningful stratification of tumor types. CanID can be accessed on GitHub at https://github.com/chenlab-sj/CanID .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。