Abstract
The multicopy 47S ribosomal RNA (rRNA) genes are among the most highly expressed genes in the human genome, yet to-date essentially no disease-causing sequence variants have been identified. This lack of disease association is surprising, as defects in 47S rRNA transcription and changes in ribosomal protein dosage, as well as nucleotide changes in the mitochondrial rRNA, all result in disease. The failure to identify rRNA-associated diseases may thus primarily stem from the experimental challenges associated with analyzing this chromosomally isolated high-copy gene family. Here, we used an evolutionary approach to test whether mutations in the human 47S genes can have phenotypic consequences. By analyzing sequence variants among rRNA genes across >3,000 individuals from the high-coverage 1,000 Genomes Project, we demonstrate highly stratified variant abundance across the 47S rRNA genes. In individual genomes, variants were frequently amplified in the transcribed spacer sequences and the evolutionarily young expansion segments, but rarely across the conserved 18S, 5.8S, and 28S rRNA-encoding sequences. Variant numbers and amplification were lowest in evolutionarily highly constrained nucleotide elements that are identical across >90% of sequenced eukaryotes. These results indicate that strong purifying selection acts to suppress copy number expansion of deleterious variants among the hundreds of 47S rRNA copies and imply that deleterious variants in the 47S rRNA have the potential to cause phenotypic consequences at very low copy numbers. As low-copy variant calls are rarely considered in association studies, this may explain why disease associations with 47S rRNA variants have so far escaped detection.