Abstract
Tandem repeat sequences (TRs), a class of repetitive genomic elements, are broadly distributed in both coding and non-coding regions. Investigating the relationship between sequences and function is essential for understanding the genome. Saccharomyces cerevisiae serves as a vital model organism and is widely used as an engineered strain. Although the transcriptional regulatory functions of TRs in the promoters of S.cerevisiae have been elucidated, our understanding of their roles within coding sequences (CDS) remains limited. In this study, we integrate RNA-seq, ChIP-seq, ATAC-seq, Hi-C, and Micro-C data from S.cerevisiae to analyze the types and distribution of TRs, and their impact on gene expression. Our results indicate that genes containing short tandem repeats (STRs) in their CDS exhibit lower expression levels. Epigenetic analysis reveals that these regions are characterized by high levels of repressive histone modifications and low levels of activating marks, with reduced chromatin accessibility and fewer chromatin interactions. Furthermore, trinucleotide and hexanucleotide repeated motifs of STR are found primarily enriched in genes encoding transcriptional regulatory proteins. This study provides new insights into the functions and characteristics of STRs in the CDS of S.cerevisiae. The identification of key STR motifs offers potential targets for the design of transcriptional regulatory elements.