Abstract
Deep learning offers hope for more efficient phylogenetic inference methods. However, it has yet to have the transformative effect on phylogenetics that it has had in other fields. Here we present a novel approach that combines deep learning with concepts behind current successful phylogenetic algorithms. Specifically, we give the deep learning algorithm access to the output of a phylogenetic dynamic program on the sequence alignment, rather than the raw sequence alignment. The algorithm then learns features based on these phylogenetically processed versions of the sequence data, which provides information that could inform local tree search. For this paper, our goal is simple: predict for each edge in a tree whether it is in a maximum parsimony tree or not. Our model consists of a recurrent neural network that learns features while traversing the input tree, which are used to classify the edge. The model makes high-quality predictions for this NP-complete problem on simulated and empirical datasets for trees of various sizes, and we believe is a stepping stone towards efficient phylogenetic inference using deep learning.