Abstract
Most phylogenetic trees are inferred using time-reversible evolutionary models that assume that the relative rates of substitution for any given pair of nucleotides are the same regardless of the direction of the substitutions. However, there is no reason to assume that the underlying biochemical mutational processes that cause substitutions are similarly symmetrical. We consider two non-reversible nucleotide substitution models: (1) a 6-rate non-reversible model (NREV6) that is applicable to analysing mutational processes in double-stranded genomes, in that complementary substitutions occur at identical rates and (2) a 12-rate non-reversible model (NREV12) that is applicable to analysing mutational processes in single-stranded (ss) genomes, in that all substitution types are free to occur at different rates. Using likelihood ratio and Akaike information criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit than the general time reversible (GTR) and NREV6 models to 21/31 dsRNA and 20/30 dsDNA datasets. As expected, however, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. We tested how non-reversibility impacts the accuracy with which phylogenetic trees are inferred. As simulated degrees of non-reversibility (DNRs) increased, the tree topology inferences using both NREV12 and GTR became more accurate, whereas inferred tree branch lengths became less accurate. We conclude that while non-reversible models should be helpful in the analysis of mutational processes in most virus species, there is no pressing need to use these models for routine phylogenetic inference.