MGM as a Large-Scale Pretrained Foundation Model for Microbiome Analyses in Diverse Contexts

MGM作为一种大规模预训练基础模型,可用于不同环境下的微生物组分析

阅读:1

Abstract

Microbial communities are integral to human health, biotechnology, and environmental systems, yet their analysis is hindered by data heterogeneity and batch effects across studies. Traditional supervised methods often fail to capture universal patterns, limiting their utility in diverse contexts. Here, we present the Microbial General Model (MGM), the first large-scale foundation model for microbiome analysis, pretrained on 260,000 samples using transformer-based language modeling. MGM employs self-attention mechanisms and autoregressive pre-training to learn contextualized representations of microbial compositions, enabling robust transfer learning for downstream tasks. Benchmark evaluations demonstrate MGM's superior performance over conventional methods (average ROC-AUC = 0.99 vs. 0.68-0.97) in microbial community classification, with enhanced generalization across geographic regions. MGM also captures spatial and temporal microbial dynamics, as evidenced by its application to a longitudinal infant cohort, where it delineated delivery mode-specific microbiome trajectories and identified keystone genera such as Bacteroides and Bifidobacterium in vaginal deliveries and Haemophilus in cesarean deliveries. Furthermore, through prompt-guided generation, MGM produced realistic microbial profiles conditioned on disease labels. By integrating self-supervised learning with domain-specific fine-tuning, MGM advances the scalability and precision of microbiome analyses, offering a unified framework for diagnostics, ecological studies, and therapeutic discovery.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。