mzLearn as a data-driven LC/MS signal detection algorithm that enables pre-trained generative models for untargeted metabolomics

mzLearn 是一种数据驱动的 LC/MS 信号检测算法，它能够为非靶向代谢组学提供预训练的生成模型。

阅读：1

作者：Pirhaji,Leila,Eaton,Jonah,Jeewajee,Adarsh K,Zhang,Min,Morris,Matthew,Karasarides,Maria

期刊：	Communications Chemistry	影响因子：	6.200
时间：	2025	起止号：	2025 Dec 18;8(1):398
doi：	10.1038/s42004-025-01791-w	研究方向：	信号转导、代谢
疾病类型：	多发性硬化症

Abstract

Metabolite alterations are linked to diseases, yet large-scale untargeted metabolomics remains constrained by challenges in signal detection and integration of diverse datasets for developing pre-trained generative models. Here, we introduce mzLearn, a data-driven MS¹ signal-detection and alignment method that runs from mzML files without user-set parameters. Across 15 public datasets, mzLearn detects 11,442 signals on average vs 7,100 (XCMS) and 4,655 (ASARI), with higher TP (89.0% vs 77.4% vs 49.6%) and lower FP (12.5% vs 17.3% vs 18.8%), while correcting instrument drifts across large cohorts without experimental QC samples. mzLearn detected 2,736 robust metabolite signals from 22 public studies (20,548 blood samples), enabling the development of pre-trained variational autoencoder for untargeted metabolomics. Learned metabolite representations reflected demographic data and when fine-tuned on unseen renal cell carcinoma data, improved risk stratification and overall survival predictions, while feature-importance analysis (SHAP) highlighted biologically plausible lipid and carnitine signals. By producing a consistent, high-quality MS¹ feature matrix at scale, mzLearn paves the way for developing pre-trained foundation models for untargeted metabolomics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。