IntegrateALL: An end-to-end RNA-seq analysis pipeline for multilevel data extraction and interpretable subtype classification in B-precursor ALL

IntegrateALL:用于B细胞前体急性淋巴细胞白血病多层次数据提取和可解释亚型分类的端到端RNA测序分析流程

阅读:1

Abstract

Transcriptome sequencing (RNA-seq) is emerging as a diagnostic standard for B-cell precursor acute lymphoblastic leukemia (B-ALL). Expression-based classifiers reach ~95% accuracy, but reproducible end-to-end solutions that also integrate transcript-derived genomic drivers and quantitative virtual karyotyping are lacking. We developed IntegrateALL, a Snakemake pipeline that standardizes RNA-seq analysis from FASTQ to rule-based subtype assignment across 26 WHO-HAEM5/ICC entities by integrating expression-based subtype prediction, gene fusion-/hotspot SNV calling, and virtual karyotyping. We introduce KaryALL, a machine learning classifier that uses normalized expression and minor-allele-frequency features (RNASeqCNV), to distinguish near-haploid, hypodiploid, and high-hyperdiploid B-ALL and chromosome-21 gains/iAMP21 (accuracy: 0.98/F1 score: 0.96 on 615 independent test samples). SNP-array concordance supported RNA-based karyotyping. Applied to 774 unselected B-ALL cases, IntegrateALL yielded unambiguous subtype assignments in 81.5%, based on concordance of gene expression class with a defining driver (75.3% of all cases) or, in selected cases, high-confidence expression-based classification alone (6.2%); the remainder (18.5%) were flagged for manual curation. Independent validation (three cohorts; n = 436, including pediatric cases) reproduced these distributions. Across all patients (n = 1210), 2.6% harbored two subtype-defining drivers, including hyperdiploidy in fusion-driven subtypes, where it was not expected, or subtype-defining SNVs (e.g., PAX5 P80R/IKZF1 N159Y) co-occurring with BCR::ABL1-positive/-like, KMT2A-, or DUX4-fusions. In most dual-driver cases, one subtype gene expression signature predominated, consistent with oncogenic hierarchies, but also with the possibility of technical artifacts, which should prompt individual orthogonal validations. IntegrateALL provides an adaptable fully reproducible workflow for molecular B-ALL characterization by systematically integrating genomic drivers and downstream gene regulation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。