Abstract
Parkinson disease (PD) is a progressive neurodegenerative disease with an incompletely understood genetic architecture that necessitates novel discovery methods. We introduce an explainable machine learning framework that uses single-cell/nuclei RNA sequencing (sc/snRNAseq) to identify molecular markers of diseased cells and nominate candidate genes for targeted genomic analysis. Application to four snRNAseq datasets characterizing the post-mortem midbrain identified cell type-specific gene sets that consistently distinguished PD from healthy cells across all datasets (mean balanced accuracy = 0.92) and highlighted ten novel candidate genes in PD. Among these, GPC6 was identified as a marker of PD dopaminergic neurons and a member of the heparan sulfate proteoglycan family, implicated in the intracellular accumulation of α-synuclein preformed fibrils-a hallmark of PD. We further validated the enrichment of rare GPC6 variants in PD across three case-control cohorts. This open-source framework is broadly applicable across diseases and promises to accelerate gene discovery in complex diseases.