Abstract
Tandem repeats (TRs) are among the most mutable loci in the human genome, but the genomic determinants of TR mutagenesis remain mysterious. We used PacBio HiFi long-read sequencing to profile nearly eight million TR loci in 28 members of a large, four-generation CEPH/Utah family designated K1463. We identified 1,270 de novo TR expansions and contractions across 20 children in the pedigree. De novo mutations were more likely to occur at loci that were longer, composed of uninterrupted motif sequences, and heterozygous in the parental germline. Children born to older fathers also exhibited more de novo mutations at short tandem repeats (STRs). A total of 43 TR loci were hyper-mutable in K1463, expanding or contracting up to twelve times across the pedigree. One hyper-mutable locus harbored multiple distinct variable number of tandem repeat (VNTR) motifs, yet only one mutated recurrently across generations. The mutable motif differed from the next most common motif by just two base pairs, suggesting that TR mutability may be influenced by subtle differences in motif composition. Overall, this study combines long-read sequencing technologies with new software tools to comprehensively investigate the factors that influence TR mutagenesis.