Improving data interpretability with new differential sample variance gene set tests

利用新的差异样本方差基因集检验提高数据可解释性

阅读:1

Abstract

BACKGROUND: Gene set analysis methods have played a major role in generating biological interpretations of omics data such as gene expression datasets. However, most methods focus on detecting homogenous pattern changes in mean expression while methods detecting pattern changes in variance remain poorly explored. While a few studies attempted to use gene-level variance analysis, such approach remains under-utilized. When comparing two phenotypes, gene sets with distinct changes in subgroups under one phenotype are overlooked by available methods although they reflect meaningful biological differences between two phenotypes. Multivariate sample-level variance analysis methods are needed to detect such pattern changes. RESULTS: We used ranking schemes based on minimum spanning tree to generalize the Cramer-Von Mises and Anderson-Darling univariate statistics into multivariate gene set analysis methods to detect differential sample variance or mean. We characterized the detection power and Type I error rate of these methods in addition to two methods developed earlier using simulation results with different parameters. We applied the developed methods to microarray gene expression dataset of prednisolone-resistant and prednisolone-sensitive children diagnosed with B-lineage acute lymphoblastic leukemia and bulk RNA-sequencing gene expression dataset of benign hyperplastic polyps and potentially malignant sessile serrated adenoma/polyps. One or both of the two compared phenotypes in each of these datasets have distinct molecular subtypes that contribute to within phenotype variability and to heterogeneous differences between two compared phenotypes. Our results show that methods designed to detect differential sample variance provide meaningful biological interpretations by detecting specific hallmark gene sets associated with the two compared phenotypes as documented in available literature. CONCLUSIONS: The results of this study demonstrate the usefulness of methods designed to detect differential sample variance in providing biological interpretations when biologically relevant but heterogeneous changes between two phenotypes are prevalent in specific signaling pathways. Software implementation of the methods is available with detailed documentation from Bioconductor package GSAR. The available methods are applicable to gene expression datasets in a normalized matrix form and could be used with other omics datasets in a normalized matrix form with available collection of feature sets.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。