Abstract
Complex human diseases, including cancer, are linked to genetic factors. Genome-wide association studies (GWASs) are powerful for identifying genetic variants associated with cancer but are limited by their reliance on case-control data. We propose approaches to expanding GWAS by using tumor and paired normal tissues to investigate somatic mutations. We apply penalized maximum likelihood estimation for single-marker analysis and develop a Bayesian hierarchical model to integrate multiple markers, identifying SNP sets grouped by genes or pathways, improving detection of moderate-effect SNPs. Applied to breast cancer data from The Cancer Genome Atlas (TCGA), both single- and multiple-marker analyses identify associated genes, with multiple-marker analysis providing more consistent results with external resources. The Bayesian model significantly increases the chance of new discoveries.