Abstract
The functions of microbial communities, including substrate conversion and pathogen suppression, arise not as a simple sum of individual species' capabilities but through complex interspecies interactions. Understanding how such functions arise from individual species and their interactions remains a major challenge, limiting efforts to rationally understand microbial roles in both natural and engineered ecosystems. Because current holistic (meta-omics) and reductionist (isolation- or single-cell-based) approaches struggle to capture these emergent microbial community functions, this study explores an intermediate strategy: analyzing simple subcommunity combinations to enable a bottom-up understanding of community-level functions. To examine the validity of this approach, we used a nine-member synthetic microbial community capable of degrading the environmental pollutant aniline, and systematically generated a dataset of 256 subcommunity combinations and their associated functions. Analyses using random forest models revealed that the subcommunity combinations of just three to four species enabled the quantitative prediction of functions in larger communities (5-9-member; Pearson's r = 0.78-0.80). Prediction performance remained robust even with limited subcommunity data, suggesting applicability to more diverse microbial communities where exhaustive subcommunity observation is infeasible. Moreover, interpreting models trained on these simple subcommunity combinations enabled the identification of key species and interspecies interactions that strongly influence the overall community function. These findings provide a methodological framework for mechanistically dissecting complex microbial community functions through subcommunity-based analysis.