Abstract
BACKGROUND: Comorbidity frequently manifests in pediatric diseases, especially for children with congenital defects due to chromosomal aberrations, such as Down Syndrome (DS). Disease comorbidities are often overlooked in genetic studies due to the nature of their complexity despite their clinical significance in diagnosis and treatments. Recurrent cerebral microbleeds are observed in a subset of patients with DS and may present an early indicator of cognitive decline and Alzheimer’s disease (AD), which affects at least 40% of all DS cases. Little attention has been directed toward exploring the genomic factors that contribute extensive brain hemorrhages and subsequent connection with dementia, nor to other comorbidities that may also present as an early dementia or AD indicators, such as epilepsy. METHODS: In this study, 1,134 whole-genome sequencing (WGS) samples were examined with DNA derived from blood, including 709 patients diagnosed with DS and 425 healthy individuals representing family members. Among the 709 cases, 20 DS patients have documented history of brain hemorrhage, while 83 exhibited severe epilepsy. Unsupervised machine learning algorithms were applied for the genomic variants identified in WGS data to generate genotype clusters, meanwhile cohort patient’s electronic medical records (EMR) were extracted, encompassing 443 self-reported medical symptoms, 2,206 abnormal lab tests, and 3,499 international classification of diseases (ICD) codes across 10 major pediatric disease categories. The association analysis was conducted between genotype clusters and each phenotype. RESULTS: For DS patients with brain hemorrhage, we identified exonic mutations in eight genes associated with cerebral hemorrhage (FDR = 0.04) and genomic variants of 11 genes for neovascularization (FDR = 0.003), while genomic variants associated with brain hemorrhage were also found to be significantly enriched in pathways associated with and/or early phase of AD, such as brain inflammation, olfactory impairments, loss of melanin/neuromelanin, G protein-coupled receptor kinase, and casein kinase 1 gamma/ epsilon activities. Of particular interest, genotype clusters of brain hemorrhage and epilepsy were significantly overlapping (p value < 1E-10), and 217 overlapping genes showed enrichment in somatic diversification of immune receptors, a known genetic trait associated with cerebral hemorrhage, epilepsy, and early stage of AD. CONCLUSION: This study applies unsupervised machine learning to whole-genome sequencing to identify genomic variants associated with pediatric comorbidities in Down syndrome, integrating these findings with longitudinal electronic health records from a large clinical cohort. These analyses highlight biologically plausible pathways and provide a hypothesis-generating framework for linking genotype and phenotype in this population. Although associations for brain hemorrhage are based on a small number of cases and require replication in independent cohorts, these findings are still set the stage for leveraging genetic insights to inform targeted interventions and treatments for pediatric DS patients. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13195-025-01946-w.