Abstract
The origins of CpG islands (CGIs) are not known. They are relatively short GC-rich regions of DNA with a higher-than-expected occurrence of CpG dinucleotides compared to most of the genome. They constitute less than 1% of the human genome but harbor approximately 40% of all transcription start sites (TSSs). CGIs are usually modulated by histone modifications in somatic cells or, in a minority of cases, permanently silenced by CpG methylation. Those that do not have TSSs are called "orphan CGIs". Here, we show that CGIs containing TSSs almost never contain any of three major classes of transposable elements (TEs) and orphan CGIs rarely do. We hypothesize that CGIs persist across evolutionary time due to counterselection against TE insertion in the germ line. The 99% of the vertebrate genome, which is not CG rich, contains 60% of TSSs and putative enhancers. We postulate that conversion of an ancestral CpG-rich genome into the current CpG-depleted version present in vertebrates may also have allowed reversible DNA methylation to function in complex and dynamic gene control circuits. Therefore, we propose an evolutionary model in which vertebrate TEs are indirectly responsible for the existence of CGIs, and the formation of regulatory elements such as TSSs and enhancers that can potentially utilize dynamic DNA methylation for gene control.