Characterizing the omics landscape based on 10,000+ datasets

基于10000多个数据集描述组学格局

阅读:3

Abstract

The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology. In this study, we investigate over ten thousand datasets to understand how proteomics, metabolomics, lipidomics, transcriptomics, and microbiome data vary in specific data characteristics. We were able to show patterns of data characteristics specific to the investigated omics types and provide a tool that enables researchers to assess how representative a given omics dataset is for its respective discipline. Moreover, we illustrate how data characteristics can impact analyses at the example of normalization in the presence of sample-dependent proportions of missing values. Given the variability of omics data characteristics, we encourage the systematic inspection of these characteristics in benchmark studies and for downstream analyses to prevent suboptimal method selection and unintended bias.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。