Abstract
With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale is becoming more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyze genome datasets from 272 individuals that employ long-read sequencing technologies. Here, we report a catalog of over 5 million tandem repeat loci, many of which are previously unannotated. Some of these loci are highly polymorphic, and many of them reside within protein-coding sequences.