Abstract
MOTIVATION: Reconstructing clonal lineage trees ("tumor phylogenetics") has become a core tool of cancer genomics. Earlier approaches based on bulk DNA sequencing (DNA-seq) have largely given way to single-cell DNA-seq (scDNA-seq), which offers far greater resolution for clonal substructure. Available data has lagged behind computational theory, though. While single-cell RNA-seq (scRNA-seq) has become widely available, scDNA-seq remains costly and technically challenging, precluding routine use on large cohorts. This forces tradeoffs between the limited genome coverage of scRNA-seq, limited availability of scDNA-seq, and limited clonal resolution of bulk DNA-seq. These limitations are especially problematic for studying structural variations and focal copy number variations that are crucial to cancer progression but difficult to observe in RNA-seq. RESULTS: We present TUSV-int, integrating bulk DNA-seq and scRNA-seq into a single deconvolution and phylogenetic inference framework while accommodating single nucleotide variants (SNV), copy number alterations (CNA), and structural variants (SV). Using integer linear programming (ILP), we deconvolve heterogeneous variant types and resolve them into a clonal lineage tree. We demonstrate improved deconvolution performance over methods lacking scRNA-seq or using more limited variant types. We further demonstrate its ability to better resolve clonal structure and mutational histories on a published DNA-seq/scRNA-seq breast cancer dataset.Source code is available at https://github.com/CMUSchwartzLab/TUSV-INT (https://doi.org/10.5281/zenodo.16884120). AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/CMUSchwartzLab/TUSV-INT (https://doi.org/10.5281/zenodo.16884120).