Evaluating genetic-based disease prediction approaches through simulation

通过模拟评估基于遗传的疾病预测方法

阅读:2

Abstract

Common diseases exhibit substantial heritability, and GWAS of these diseases have revealed hundreds of thousands of high-frequency disease susceptibility variants throughout the genome. These studies offer the prospect of using genomic data to improve disease prediction and diagnosis, however, the relative performance of different predictive modeling approaches is not well-characterized. To investigate this systematically, we constructed a Monte Carlo simulation generating model genomes with 500 SNPs carrying risk alleles that are parameterized by the strength of their effects and by different modes of inheritance—additive, dominant, recessive, and combinations thereof. After generating genotypes for cases and controls, several machine learning classifiers (logistic regression, naïve Bayes, random forests, and neural networks, with and without feature selection) were applied to predict disease phenotypes from genotypes. Each classifier’s error rates were evaluated and compared using AUC. We found that random forest models were the most accurate predictors of disease over the range of inheritance parameters, followed by logistic regression and naïve Bayes, while the feedforward multilayer neural network model had lower AUC. We also investigated the association of AUC with the difference in polygenic risk score (PRS) between disease and control samples by comparing AUC in the simulations to the values predicted from the PRS distributions, finding a monotonic, curvilinear relationship as predicted analytically from odds-risk and liability threshold models. Our results also show that with small risk effects, the odds-risk model provided an accurate estimate of the AUC-PRS association while a liability threshold model performed better when risk alleles had strong effects. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00439-025-02798-y.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。