A multivariate Bernoulli model to predict DNaseI hypersensitivity status from haplotype data

利用单倍型数据构建多元伯努利模型预测DNaseI超敏状态

阅读:1

Abstract

MOTIVATION: Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status--an epigenetic mark describing chromatin accessibility--from population-scale haplotype data. RESULTS: We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R2=0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R2 than the two competing methods in empirical data. AVAILABILITY AND IMPLEMENTATION: Software implementing the method described can be downloaded at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT: shihuwenbo@ucla.edu or pasaniuc@ucla.edu.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。