Composition-on-composition regression analysis for multi-omics integration of metagenomic data

基于组成成分的回归分析用于宏基因组数据的多组学整合

阅读:1

Abstract

MOTIVATION: Compositional data are frequently encountered in many disciplines, such as in next-generation sequencing experiments widely used in biomedical studies. Regression analysis with compositional data as either responses or predictors has been well studied. However, when both responses and predictors are compositional, the inventory of analysis tools is surprisingly limited, especially in the high-dimensional setting. Among the few existing methods, most of them rely on a log-ratio transformation to move compositional data from the simplex to real numbers. Yet, a serious weakness of these methods is their failure to handle the substantial fraction of zeroes observed in data collected from next-generation sequencing experiments. RESULTS: To investigate associations between two high-dimensional multi-omics compositions, we propose a composition-on-composition (COC) regression analysis method which does not require log-ratio transformations and hence can handle zeroes in the data. To account for high dimensionality, we estimate regression coefficients using a penalized estimation equation approach. Finally, inference procedures for COC regression are also proposed. Superior performance of COC is demonstrated through both comprehensive numerical simulations and case studies. AVAILABILITY AND IMPLEMENTATION: Source R codes to implement COC method is available at https://github.com/nrios4/COC.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。