Abstract
Transcriptome sequencing (RNA-seq) is emerging as a diagnostic standard for B-cell precursor acute lymphoblastic leukemia (B-ALL). Expression-based classifiers reach ~95% accuracy, but reproducible end-to-end solutions that also integrate transcript-derived genomic drivers and quantitative virtual karyotyping are lacking. We developed IntegrateALL, a Snakemake pipeline that standardizes RNA-seq analysis from FASTQ to rule-based subtype assignment across 26 WHO-HAEM5/ICC entities by integrating expression-based subtype prediction, gene fusion-/hotspot SNV calling, and virtual karyotyping. We introduce KaryALL, a machine learning classifier that uses normalized expression and minor-allele-frequency features (RNASeqCNV), to distinguish near-haploid, hypodiploid, and high-hyperdiploid B-ALL and chromosome-21 gains/iAMP21 (accuracy: 0.98/F1 score: 0.96 on 615 independent test samples). SNP-array concordance supported RNA-based karyotyping. Applied to 774 unselected B-ALL cases, IntegrateALL yielded unambiguous subtype assignments in 81.5%, based on concordance of gene expression class with a defining driver (75.3% of all cases) or, in selected cases, high-confidence expression-based classification alone (6.2%); the remainder (18.5%) were flagged for manual curation. Independent validation (three cohorts; n = 436, including pediatric cases) reproduced these distributions. Across all patients (n = 1210), 2.6% harbored two subtype-defining drivers, including hyperdiploidy in fusion-driven subtypes, where it was not expected, or subtype-defining SNVs (e.g., PAX5 P80R/IKZF1 N159Y) co-occurring with BCR::ABL1-positive/-like, KMT2A-, or DUX4-fusions. In most dual-driver cases, one subtype gene expression signature predominated, consistent with oncogenic hierarchies, but also with the possibility of technical artifacts, which should prompt individual orthogonal validations. IntegrateALL provides an adaptable fully reproducible workflow for molecular B-ALL characterization by systematically integrating genomic drivers and downstream gene regulation.