mzLearn as a data-driven LC/MS signal detection algorithm that enables pre-trained generative models for untargeted metabolomics

mzLearn 是一种数据驱动的 LC/MS 信号检测算法,它能够为非靶向代谢组学提供预训练的生成模型。

阅读:1

Abstract

Metabolite alterations are linked to diseases, yet large-scale untargeted metabolomics remains constrained by challenges in signal detection and integration of diverse datasets for developing pre-trained generative models. Here, we introduce mzLearn, a data-driven MS¹ signal-detection and alignment method that runs from mzML files without user-set parameters. Across 15 public datasets, mzLearn detects 11,442 signals on average vs 7,100 (XCMS) and 4,655 (ASARI), with higher TP (89.0% vs 77.4% vs 49.6%) and lower FP (12.5% vs 17.3% vs 18.8%), while correcting instrument drifts across large cohorts without experimental QC samples. mzLearn detected 2,736 robust metabolite signals from 22 public studies (20,548 blood samples), enabling the development of pre-trained variational autoencoder for untargeted metabolomics. Learned metabolite representations reflected demographic data and when fine-tuned on unseen renal cell carcinoma data, improved risk stratification and overall survival predictions, while feature-importance analysis (SHAP) highlighted biologically plausible lipid and carnitine signals. By producing a consistent, high-quality MS¹ feature matrix at scale, mzLearn paves the way for developing pre-trained foundation models for untargeted metabolomics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。