Abstract
BACKGROUND: Hereditary hemorrhagic telangiectasia (HHT) is a near-fully penetrant autosomal dominant disorder characterized by nosebleeds, anemia, and arteriovenous malformations. The great majority of HHT cases are caused by heterozygous loss-of-function mutations in ACVRL1 or ENG, which encode proteins that function in bone morphogenetic protein signaling. HHT prevalence is estimated at 1 in 5000 and is accordingly classified as rare. However, HHT is suspected to be underdiagnosed. METHODS: To estimate the true prevalence of HHT, we summed allele frequencies of predicted pathogenic variants in ACVRL1 and ENG using 3 methods. For method 1, we included Genome Aggregation Database (gnomAD v4.1) variants with ClinVar annotations of pathogenic or likely pathogenic, plus unannotated variants with a high probability of causing disease. For method 2, we evaluated all ACVRL1 and ENG gnomAD variants using threshold filters based on accessible in silico pathogenicity prediction algorithms. For method 3, we developed a machine learning-based classification system to improve the classification of missense variants. RESULTS: We calculated an HHT prevalence of between 2.1 in 5000 and 11.9 in 5000, or 2 to 12× higher than current estimates. Application of our machine learning-based classification method revealed missense variants as the greatest contributor to pathogenic allele frequency and similar HHT prevalence across genetic ancestries. CONCLUSIONS: Our results support the notion that HHT is underdiagnosed and that HHT prevalence may be above the threshold of a rare disease.