Abstract
Phylodynamic analysis has been instrumental in elucidating epidemiological and evolutionary dynamics of pathogens. Bayesian phylodynamics integrates out phylogenetic uncertainty, which is typically substantial in phylodynamic datasets due to limited genetic diversity. Phylodynamic inference does not, however, scale with modern datasets, partly due to difficulties in traversing tree space. Here, we characterize tree space and landscape in phylodynamic inference and assess its impacts on analysis difficulty and key biological estimates. By running extensive Bayesian analyses of 15 classic large phylodynamic datasets and carefully analyzing the posterior samples, we find that the posterior tree landscape is diffuse yet rugged, leading to widespread tree sampling problems that usually stem from sequences in a small part of the tree. We develop clade-specific diagnostics to show that a few sequences-including putative recombinants and recurrent mutants-frequently drive the ruggedness and sampling problems, although existing data-quality tests show limited power to detect them. The sampling problems can significantly impact phylodynamic inferences or distort major biological conclusions; the impact is usually stronger on "local" estimates (e.g., introduction history) associated with particular clades than on "global" parameters (e.g., demographic trajectory) governed by general tree shape. We evaluate existing Markov chain Monte Carlo diagnostics and diagnostics developed here, and offer strategies for optimizing phylodynamic analysis settings and mitigating sampling problem impacts. Our findings highlight the need and directions to develop efficient traversal over rugged tree landscapes, ultimately advancing scalable and reliable phylodynamics.