Abstract
BACKGROUND: Cancer phylogenies are key to understanding tumor evolution. However, due to the uncertainty in phylogenetic estimation, one typically infers many, equally-plausible phylogenies from bulk DNA sequencing data of tumors, hindering downstream analysis that relies on correct phylogenies. RESULTS: To resolve this challenge, we introduce Sapling, a method to solve two variants of the BACKBONE TREE INFERENCE FROM READS problem, which seeks a small set of backbone trees on a subset of mutations that collectively summarize the space of plausible cancer phylogenies. We prove that the problems are NP-hard. CONCLUSIONS: On simulated and real data, we demonstrate that Sapling is capable of inferring high-quality backbone trees that adequately summarize the space of plausible cancer phylogenies. In addition, we demonstrate that Sapling is able to infer full-size trees with higher likelihoods than state-of-the-art methods.