Abstract
Examination of the genome sequence of Saccharomyces cerevisiae strain S288c and 93 additional diverse strains allows identification of the 5885 genes that make up the core set of genes in this species and gives a better sense of the organization and plasticity of this genome. S. cerevisiae strains each contain dozens to hundreds of strain-specific genes. In addition to a variable content of retrotransposons Ty1-Ty6, some strains contain a novel transposable element, Ty7. Examination further shows that some annotated putative protein coding genes are likely artifacts. We propose altering approximately 5% of the current annotations in the widely used reference strain S288c. Potential null alleles are common and found in all 94 strains examined, with these potential null alleles typically containing a single stop codon or frameshift. There are also gene remnants, pseudogenes, and variable arrays of genes. Among the core genes there are now only 364 protein coding genes of unknown function, classified as uncharacterized in the Saccharomyces Genome Database. This work suggests that there is a role for carefully edited and annotated genome sequences in understanding the genome organization and content of a species. We propose that gene remnants be added to the repertoire of features found in the S. cerevisiae genome, and likely other fungal species.