Abstract
BACKGROUND: DNA G-quadruplexes (G4s) are four-stranded DNA structures. Endogenous G-quadruplexes (eG4s) have been identified as pivotal regulatory elements for gene expression in the human genome. The measurement of evolutionary conservation can be employed to ascertain the functional relevance of putative regulatory elements. However, the evolutionary profiles of human eG4s remain largely unknown. RESULTS: Here, we construct mammalian evolutionary profiles of human eG4s based on a comprehensive reference annotation of human eG4s from the integration of the eG4 database EndoQuad covering 41 human cell lines and our home-made G4 CUT&Tag data covering seven cell lines. We find that transposable elements contribute substantially to the evolutionary spread of primate-specific eG4s. A total of 92,910 highly conserved human eG4s were identified under mammalian constraint. By developing and utilizing the eG4 prediction tool eG4finder, which is based on a large language model, we verify the high structural conservation of highly conserved eG4s. The enrichment of highly conserved eG4s in developmental and aging pathways highlights their potential significance in key biological processes. Notably, highly conserved eG4s exhibit higher regulatory potential, regulatory activity and affinity for transcription factors. We demonstrate that highly conserved eG4s are the most powerful transcriptional activation elements in the total eG4 collection. Meanwhile, trait-associated variants and variants affecting the expression of high phenotypic severity genes are most enriched in highly conserved eG4s. CONCLUSIONS: Our study highlights the important regulatory functions and close association with complex human traits of human eG4s that are highly conserved in the mammalian lineage.