Abstract
The evolution of genes de novo from ancestrally nongenic sequences may be a significant mechanism of gene origin. Many studies have focused on identifying de novo genes in distant evolutionary comparisons, which bias the sample of de novo genes toward older genes that have acquired important functions and have been retained and refined by selection. In this report, we focus on the earliest steps in de novo gene origin by identifying young, polymorphic transcripts that may be missed by other study designs. To accomplish this, we sequenced tissue transcriptomes from a much larger sample of genotypes than have been used in previous analyses of de novo genes in Drosophila melanogaster. We identified 90 potential species-specific de novo genes expressed in the male accessory glands of 29 D. melanogaster lines derived from the same natural population. We find that most young transcripts are both rare in the population and transcribed at low abundance. Improved sampling of both ingroup and outgroup genotypes reveals that many young genes are polymorphic in more than 1 species, resulting in substantial uncertainty about the age and phylogenetic distribution of de novo genes. Among the genes expressed in the same tissue, gene age correlates with proximity to other tissue-specific genes, with the youngest genes being least likely to occur near established tissue-specific genes. This and other lines of evidence suggest that de novo genes do not commonly evolve by simply reutilizing preexisting regulatory elements. Together, these results provide new insights into the origin and early evolution of de novo genes.