Abstract
Longitudinal microbiome data provide a unique opportunity to explore dynamic interactions between microbial communities and disease progression. However, these data are often characterized by missing values, sparse signals, and limited interpretability, which impede effective biomarker discovery and accurate disease modeling. Therefore, we propose SysLM, a comprehensive deep learning framework for systematic analysis of longitudinal microbiome data. It comprises two synergistic modules: SysLM-I and SysLM-C. SysLM-I focuses on the task of missing-value inference, combines metadata and three feature enhancement strategies, and comprehensively captures temporal causality and long-term dependence through Temporal Convolutional Network and Bi-directional Long Short-Term Memory modules. SysLM-C integrates deep learning with causal inference modeling to construct three causal spaces to accomplish the tasks of classification and screening of multiple types of biomarkers, including differential biomarkers of microbiomes, network biomarkers, core biomarkers, dynamic biomarkers, disease-specific biomarkers, and shared biomarkers. SysLM demonstrates superior performance in imputation, classification, and biomarker discovery across multiple datasets. Importantly, it uncovers novel microbial mechanisms underlying ulcerative colitis, highlighting its value for precision medicine. By integrating deep learning with causal modeling, SysLM offers a promising approach to advance microbiome-based disease research and facilitate the development of targeted therapeutic strategies.