Abstract
MOTIVATION: Traditional genome-wide association studies (GWAS) aim to uncover the genetic variants associated with a single phenotype of interest (typically a disease), and to elucidate its genotypic architecture. However, many of today's GWAS simultaneously measure multiple related phenotypes, leading to the possibility of pursuing the reverse aim of elucidating the "phenotypic architecture" of a single genetic variant. In other words, we may ask what combination of measured phenotypes is associated with a given genotypic variant. ReverseGWAS is an algorithmic platform for answering such questions in the context of large-scale multi-phenotype GWAS. RESULTS: We demonstrate the effectiveness of ReverseGWAS on simulated data, showing its ability to identify logical combinations of phenotypes with a reasonable amount of noise. We then apply it to a selection of combined phenotypes from the UK Biobank, obtaining 719 candidate associations using autoimmune diseases and 205 using common ICD10 codes. We find that the majority of these associations (546/719 and 111/205, respectively) successfully replicate in an independent cohort, FinnGen. AVAILABILITY AND IMPLEMENTATION: The source code of ReverseGWAS is freely available to non-commercial users as an installable R package at https://github.com/Leonardini/rgwas.