Abstract
BACKGROUND: Metabolite production, consumption, and exchange are intimately involved with host health and disease, as well as being key drivers of host-microbiome interactions. Despite the increasing prevalence of datasets that jointly measure microbiome composition and metabolites, computational tools for linking these data to the status of the host remain limited. RESULTS: To address these limitations, we developed MMETHANE, a purpose-built deep learning model for predicting host status from paired microbial sequencing and metabolomic data. MMETHANE incorporates prior biological knowledge, including phylogenetic and chemical relationships, and is intrinsically interpretable, outputting an English-language set of rules that explains its decisions. Using a compendium of six datasets with paired microbial composition and metabolomics measurements, we showed that MMETHANE always performed at least on par with existing methods, including blackbox machine learning techniques, and outperformed other methods on 80% of the datasets evaluated. We additionally demonstrated through two cases studies analyzing inflammatory bowel disease gut microbiome datasets that MMETHANE uncovers biologically meaningful links between microbes, metabolites, and disease status. CONCLUSIONS: MMETHANE is an open-source software package that brings state-of-the-art interpretable AI technologies to the microbiome field, emphasizing usability with simple written explanations of its decisions and biologically relevant visualizations. This robust and accurate tool enables investigation of the interplay between microbes, metabolites, and the host, which is critical for understanding the mechanisms of host-microbial interactions and ultimately improving the diagnosis and treatment of human diseases impacted by the microbiome. Video Abstract.