Abstract
Using the Telomere-to-Telomere reference, we assemble the distribution of simple tandem repeat lengths present in the human genome. Analyzing over three hundred mammalian genomes, we find remarkable consistency in the shape of the distribution across evolutionary epochs. All observed genomes harbor an excess of long repeats, which are potentially prone to developing into repeat expansion disorders. We measure mutation rates for repeat length instability, quantitatively model the per-generation action of mutations, and observe the corresponding long-term behavior shaping the repeat tract length distribution. We find that short repetitive sequences appear to be a straightforward consequence of random substitution. Evolving largely independently, longer repeats (above roughly 10 nt) emerge and persist in a rapidly mutating dynamic balance between expansion, contraction, and interruption. These mutational processes, collectively, are sufficient to explain the abundance of long repeats, without invoking natural selection. Our analysis constrains properties of molecular mechanisms responsible for maintaining genome fidelity that underlie repeat instability.