Abstract
The growing availability of genomes from non-model organisms offers new opportunities to identify functional loci underlying trait variation through comparative genomics. While cis-regulatory regions drive much of phenotypic evolution, linking them to specific functions remains challenging. We identified 514 cis-regulatory motifs enriched in regulatory regions of five diverse grass species, with 73% consistently enriched across all, suggesting a deeply conserved regulatory code. Leveraging 57 new contig-level genome assemblies, we then quantified shared occupancy of specific motif instances within gene-proximal regions across 589 grass species, revealing widespread gain and loss over evolutionary time. Shared occupancy declined rapidly over the first few million years of divergence, yet ∼50% of motif instances were shared back to the origin of grasses ∼100 million years ago. We used phylogenetic mixed models to identify motif gains and losses associated with ecological niche transitions. Our models revealed significant environmental associations across 1282 motif-orthogroup combinations, including convergent gains of HSF/GARP motifs at an alpha-N-acetylglucosaminidase gene associated with occurrence in temperate environments. Our findings support a "stable motifs, variable binding sites" model in which cis-regulatory evolution involves turnover of thousands of individual binding site instances while largely preserving transcription factors' binding preferences. Our results highlight the potential of comparative genomics and phylogenetic mixed models to reveal the genetic basis of complex traits.