Improving the computational efficiency of fully Bayes inference and assessing the effect of misspecification of hyperparameters in whole-genome prediction models

提高全贝叶斯推断的计算效率并评估全基因组预测模型中超参数错误设定的影响

阅读:2

Abstract

BACKGROUND: The reliability of whole-genome prediction models (WGP) based on using high-density single nucleotide polymorphism (SNP) panels critically depends on proper specification of key hyperparameters. A currently popular WGP model labeled BayesB specifies a hyperparameter π, that is `loosely used to describe the proportion of SNPs that are in linkage disequilibrium (LD) with causal variants. The remaining markers are specified to be random draws from a Student t distribution with key hyperparameters being degrees of freedom v and scale s(2). METHODS: We consider three alternative Markov chain Monte Carlo (MCMC) approaches based on the use of Metropolis-Hastings (MH) to estimate these key hyperparameters. The first approach, termed DFMH, is based on a previously published strategy for which s(2) is drawn by a Gibbs step and v is drawn by a MH step. The second strategy, termed UNIMH, substitutes MH for Gibbs when drawing s(2) and further collapses or marginalizes the full conditional density of v. The third strategy, termed BIVMH, is based on jointly drawing the two hyperparameters in a bivariate MH step. We also tested the effect of misspecification of s(2) for its effect on accuracy of genomic estimated breeding values (GEBV), yet allowing for inference on the other hyperparameters. RESULTS: The UNIMH and BIVMH strategies had significantly greater (P < 0.05) computational efficiencies for estimating v and s(2) than DFMH in BayesA (π = 1) and BayesB implementations. We drew similar conclusions based on an analysis of the public domain heterogeneous stock mice data. We also determined significant drops (P < 0.01) in accuracies of GEBV under BayesA by overspecifying s(2), whereas BayesB was more robust to such misspecifications. However, understating s(2) was compensated by counterbalancing inferences on v in BayesA and BayesB, and on π in BayesB. CONCLUSIONS: Sampling strategies based solely on MH updates of v and s(2), and collapsed representations of full conditional densities can improve the computational efficiency of MCMC relative to the use of Gibbs updates. We believe that proper inferences on s(2), v and π are vital to ensure that the accuracy of GEBV is maximized when using parametric WGP models.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。