Abstract
The COVID-19 pandemic has underscored the need for accurate epidemic forecasting to predict pathogen spread, evolution, and evaluate intervention strategies. Forecast reliability hinges on detailed knowledge of disease transmission across population segments, which may be inferred from contact surveys or mobility data. However, these indirect approaches make it difficult to estimate rare transmissions between socially or geographically distant communities. We show that the steep ramp-up of genome sequencing surveillance during the pandemic can be leveraged to directly identify transmission patterns between geographically defined communities. Our approach uses a hidden Markov model to infer the fraction of infections a community imports from others based on how rapidly allele frequencies in the focal community converge to those in the donor communities. Applying this method to SARS-CoV-2 sequencing data from England and the United States, we uncover networks of intercommunity transmission that reflect geographical relationships while exposing significant long-range interactions. The scaling of importation rate with distance is consistent across both countries, yet weaker than expected based on mobility data, highlighting limitations of indirect inference. We show that transmission patterns can change between waves of variants of concern and analyze how the inferred heterogeneity in intercommunity transmission impacts evolutionary forecasts. While applied here to geographically defined communities, our approach could be applied to those defined by other traits (e.g., age, socioeconomic status), provided time-series data can be stratified accordingly. Overall, our study highlights population genomic time series data as a crucial record of epidemiological interactions, which can be deciphered using tree-free inference methods.