Abstract
Identity-by-descent (IBD) mapping provides complementary signals to genome-wide association studies (GWASs) when multiple causal haplotypes or variants are present but not directly tested. We propose the difference between affected-affected and control-control IBD rates as an IBD mapping statistic. For our hypothesis test, we use a computationally efficient approach from the stochastic processes literature to derive genome-wide significance levels that control the family-wise error rate (FWER). Whole-genome simulations indicate that our method conservatively controls the FWER. We pair our IBD mapping approach with a selection scan and a validation procedure via phenotype randomization so that one can contrast results for evidence of confounding due to positive selection, sequencing artifacts, or population structure. We developed automated and reproducible workflows to phase haplotypes, call local ancestry probabilities, and perform the IBD mapping scan, the former two tasks being important preprocessing steps for haplotype analyses. We applied our methods to search for Alzheimer disease (AD) risk loci in the Alzheimer's Disease Sequencing Project (ADSP) genome data. We identified six genome-wide significant signals of AD risk among samples genetically similar to African and European reference populations and self-identified Amish samples. Some variants in the six risk loci we detected have previously been associated with AD, dementia, and memory decline, and four genes at two of these loci have already been nominated as therapeutic targets for AD. Overall, our scalable approach makes further use of large consortia resources, which are expensive to collect but provide insights into disease mechanisms.