Abstract
Scaling is a common practice in population genetic simulations to increase computational efficiency. However, few studies systematically examine the effects of scaling on diversity estimates and the comparability of scaled results to unscaled simulations and empirical data. We investigate the effects of scaling in two species, modern humans and Drosophila melanogaster. These species have stark differences in population size and generation time, necessitating moderate-to-no scaling for humans and dramatic scaling for Drosophila. We determine how coalescence, runtime, memory, estimates of diversity, the site frequency spectra, and linkage disequilibrium are influenced by scaling. We also examine the impact of simulated segment length and burn-in time on these metrics. Our results demonstrate that while computational efficiency improves with scaling, large scaling factors distort genetic diversity and dynamics between genetic variants, resulting in deviations from the intended model and empirical observations. Specifically, strongly scaled simulations may experience stronger negative selection on deleterious mutations, which amplifies background selection and purges linked mutations, leaving only rare strongly deleterious variants in the final population. We additionally show that a heuristic burn-in length of 10N generations is often insufficient for full coalescence in both models and alters expected linkage disequilibrium patterns. Finally, we provide considerations for conducting scaled simulations and offer potential strategies for the mitigation of scaling effects. For most nonmodel species simulations, we advocate for a bespoke scaling strategy drawn from these use cases.