Exposure-inducible genes may contribute to missingness in RNAseq-based gene expression analyses

暴露诱导基因可能导致基于RNA测序的基因表达分析中出现缺失值。

阅读:1

Abstract

Missing gene expression values are a common issue in RNAseq-based analyses of gene expression. However, an analysis of genetic and environmental factors contributing to data missingness in RNAseq-based assessment of gene expression has never been conducted. In this study we tried to identify factors in RNAseq data missingness. We used RNAseq data from 66 lung adenocarcinoma tumors and corresponding adjacent normal lung tissues. We found a strong negative association between the gene expression level and missingness, supporting the idea that the borderline expression level is a key contributor to missingness. In a more detailed analysis, the relationship between gene expression and missingness was more complex: while the expected negative association between missingness and the expression level was observed for genes with low missingness, mean expression spiked at the right end of the distribution which included genes with very high missingness. We hypothesized that genes with a high missing rate include not only genes with borderline expression but also genes with high expression in some individuals but no expression in others (true biological missingness, TBM). The results of the comparative analysis of missingness in smokers and nonsmokers, an examination of the proportion of known tobacco smoke-sensitive genes by missing rate, and gene enrichment analysis support the hypothesis. We argue that it would be beneficial first to check data for the presence of genes with true biological missingness. The presence of highly expressed genes with missingness is an indication of TBM related to inter-individual variation in gene expression level. The results of our analysis call for caution in indiscriminatory imputation of missing values. When true biological missingness is present, it is advisable to identify genes with true biological missingness and analyze them separately because including such genes in imputation will lead to a bias: expression values will be assigned to a subset of the genes that are not expressed.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。