Abstract
The prevalence of synthetic associations in GWAS, where non-causal variants become significant by tagging multiple undetected causal variants and not necessarily in strong linkage disequilibrium with any single one, remains unexplored. We introduce a novel machine-learning approach using only genotype data to infer such associations in human GWAS. Our analysis reveals that 3-5% of GWAS Catalog peaks may represent potential synthetic associations, often arising from epistatic interactions between common variants rather than multiple rare variants acting independently. Our findings highlight the need for multi-locus models and emphasize careful GWAS interpretation and follow-up analyses like fine-mapping and trait prediction.