Abstract
Mirror DNA repeats were found in genomic DNA several decades ago, but their role and the mechanisms leading to their abundance have remained a mystery. The only firmly established functional property was that the subset of long homopurine-homopyrimidine mirror repeats (H-motifs) can form a triple-helical DNA secondary structure (H-DNA). Here, we analyzed the sequence content of mirror repeats in the telomere-to-telomere human genome sequence. Our findings suggest that long mirror repeats in genomic DNA originate exclusively from the expansion of simple tandem repeats (STRs). Strikingly, long H-motifs are highly overrepresented compared to all other mirror repeats and STRs. We hypothesize that long H-motif STRs could be particularly expansion-prone owing to H-DNA-mediated genome instability, pointing to the length at which this structure becomes a significant hindrance.