A New Pipeline for the Normalization and Pooling of Metabolomics Data.

用于代谢组学数据标准化和合并的新流程

阅读:3
作者:Viallon Vivian, His Mathilde, Rinaldi Sabina, Breeur Marie, Gicquiau Audrey, Hemon Bertrand, Overvad Kim, Tjønneland Anne, Rostgaard-Hansen Agnetha Linn, Rothwell Joseph A, Lecuyer Lucie, Severi Gianluca, Kaaks Rudolf, Johnson Theron, Schulze Matthias B, Palli Domenico, Agnoli Claudia, Panico Salvatore, Tumino Rosario, Ricceri Fulvio, Verschuren W M Monique, Engelfriet Peter, Onland-Moret Charlotte, Vermeulen Roel, Nøst Therese Haugdahl, Urbarova Ilona, Zamora-Ros Raul, Rodriguez-Barranco Miguel, Amiano Pilar, Huerta José Maria, Ardanaz Eva, Melander Olle, Ottoson Filip, Vidman Linda, Rentoft Matilda, Schmidt Julie A, Travis Ruth C, Weiderpass Elisabete, Johansson Mattias, Dossus Laure, Jenab Mazda, Gunter Marc J, Lorenzo Bermejo Justo, Scherer Dominique, Salek Reza M, Keski-Rahkonen Pekka, Ferrari Pietro
Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples' originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。