Benchmarking Sparse Variable Selection Methods for Genomic Data Analyses

基因组数据分析中稀疏变量选择方法的基准测试

阅读:1

Abstract

Genomics and other studies encounter many features and a selection of essential features with high accuracy is desired. In recent years, there has been a significant advancement in the use of Bayesian inference for variable (or feature) selection. However, there needs to be more practical information regarding their implementation and assessment of their relative performance. Our goal in this paper is to perform a comparative analysis of approaches, mainly from different Bayesian genres that apply to genomic analysis. In particular, we are examining how well shrinkage, global-local, and mixture priors, SUSIE, and a simple two-step procedure-namely, RFSFS, which we propose-perform in terms of several metrics: FDR, FNR, F-score, and mean squared prediction error under various simulation scenarios. There is no single method that outperforms others uniformly across all scenarios and in terms of variable selection and prediction performance metrics. So, we order the methods based on the average ranking across different scenarios. We found LASSO, spike-and-slab prior with normal slab (SN), and RFSFS are the most competitive methods for FDR and F-score when features are uncorrelated. When features are correlated, SN, SuSIE, and RFSFS are the most competitive methods for FDR whereas LASSO has an edge over SuSIE in terms of F-score. For illustration, we have applied these methods to analyzed The Cancer Genome Atlas Program (TCGA) renal cell carcinoma (RCC) data and have offered methodological direction.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。