Abstract
Inferring the fitness effect of mutations is a basic problem in understanding the evolution of populations over time. When multiple mutations are present in a population simultaneously, genetic linkage comes into play, and the fate of an individual mutation depends on both its fitness as well as the background on which it occurs. Accurate inference of fitness effects for evolutionary systems with multiple competing mutations is therefore contingent on resolving the confounding effects of genetic linkage, captured by the covariance between allele-pairs. Increasingly, evolutionary studies are using short-read sequencing technologies to produce detailed snapshots of evolving populations. This presents a problem as the frequencies of allele-pairs are not known beyond the read-length, hampering any attempt to resolve the effects of genetic linkage between pairs of loci residing on different reads. Here we present a computationally efficient pipeline for inferring selection from short-read time-series data with partial allele-pair frequency information, while accounting for linkage. Simulation results show that the method has good performance and is scalable to systems with several thousand variants. Additionally, we demonstrate the pipeline's utility on real datasets of within-host HIV and SARS-CoV-2 evolution, showcasing its applicability in resolving linkage effects from complex evolutionary histories.