Abstract
Saliva harbors a complex human microbiota closely linked to the occurrence and progression of various diseases. This meta-analysis of public 16S saliva data aimed to expand understanding of the microbiota's associations with multiple diseases and explore its potential as molecular markers for multi-disease prediction, overcoming the limitation of single-disease-focused studies. From PubMed (2016-2024), 22 cohorts met the screening criteria (V3-V4 region 13 cohorts, V4 region 9 cohorts), comprising 7,750 samples. Bioinformatics analyses using QIIME2, Wekemo, and statistical modeling revealed saliva microbiota community characteristics, identified core microbes in the negative control group, and constructed a multi-disease prediction model based on 16S data. Key findings included: (1) significant differences in microbiota structure across physiological/pathological states (e.g., NPC groups resembled controls but diverged from colorectal cancer and PLHIV groups at the genus level); (2) Nine core microbiota, such as g:Streptococcus and g:Haemophilus_D_735815, were identified in the saliva samples of the negative control group; (3) robust classification performance of multi-class random forest models (AUC: 0.898-0.995 for V3-V4, 0.957-1 for V4). This study validated the feasibility of establishing healthy baselines via saliva microbiota and using machine learning for non-invasive disease diagnosis. Future research should expand disease coverage, increase sample sizes, and further investigate microbiota-disease associations to advance the development of non-invasive diagnostics.