Abstract
MOTIVATION: Phylogenetic trees are tree-like data structures commonly adopted to mathematically represent cancer clonal evolution. The information encoded by phylogenetic trees is important for clinical outcomes, but the automatic extraction of such information is still hard, also due to the fact that working directly with tree-like data structures is complex. This is especially true for machine learning tasks, where models are usually designed for vector data. RESULTS: We introduce CPhyT-GNN, a novel Deep Learning method to compute unsupervised embeddings of phylogenetic trees. The embeddings learnt by CPhyT-GNN are vectors that can be used for a variety of machine learning tasks. CPhyT-GNN is based on Graph Neural Networks, which allow to obtain representations that combine the information provided by the alterations present in the tumor and the topological information provided by the corresponding phylogenetic tree. Experiments with cancer data show that the embeddings learnt by our model are general-purpose and can be applied to different tasks, with results that improve the state-of-the-art. AVAILABILITY AND IMPLEMENTATION: Data and code are available at the following link: https://github.com/VandinLab/CPhyT-GNN.