Abstract
A long-term goal of biomedical research is to decipher how genetic processes influence disease formation. Ubiquitous and advancing microarray technology can measure millions of DNA structural variants (single-nucleotide polymorphisms, or SNPs) and thousands of gene transcripts (RNA expression microarrays) in cells. Both of these information modalities can be brought to bear on disease etiology. This paper develops a Bayesian network-based approach to integrate SNP and expression microarray data. The network models SNP-gene interactions using a phenotype-centric network. Inferring the network consists of two steps: variable selection and network learning. The learned network illustrates how functionally dependent SNPs and genes influence each other, and also serves as a predictor of the phenotype. The application of the proposed method to a pediatric acute lymphoblastic leukemia dataset demonstrates the feasibility of our approach and its impact on biological investigation and clinical practice.