Abstract
Natural selection heavily influences the evolutionary trajectories of species by impacting their genotype-to-phenotype transitions. On the molecular level, these transitions are shaped by the regulatory sequences. In this study, we employed a combination of population and comparative genomics to investigate how natural selection affects specific regulatory sequence classes involved in the regulatory transcription factor-DNA interactions. These interactions consist of two motifs, namely: transcription factor-binding domains and transcription factor-binding sites. Using publicly available annotation data for Homo sapiens, Arabidopsis thaliana, and Drosophila melanogaster, we first constructed the species-specific lists of the transcription factor-binding domain regions. On applying some of the commonly used summary statistics, we found signals of purifying selection acting on transcription factor-binding domains, consistent with their functional importance. Next, using the biochemical assay-based annotations, we identified potential transcription factor-binding site regions and used variants within them as nonsynonymous equivalents. Interestingly, we also observed that noncoding transcription factor-binding site regions showed similar levels of constraint to that of coding regions for populations with large Ne. Signals of positive selection were limited. Nevertheless, McDonald-Kreitman estimates revealed that, in both fruit-fly and thale-cress, α for transcription factor-binding domains was consistently higher than for adjacent nonbinding domains, whereas no such difference was apparent in humans. Taken together, our comparative analysis shows that the efficiency of negative-and to a lesser extent positive-selection on transcription factor-DNA interface elements scales with effective population size. The dataset and analysis pipeline provide a baseline for future studies of regulatory evolution across coding and noncoding regions.